How to fetch all the urls which are not linked using regex

How to fetch all the urls which are not linked using regex - php

I need to fetch all the urls from the given string which are not linked(url without anchor tag).
I know the regex (http|ftp|https)://([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])? to fetch all the urls from the given string.
Input:
<div class='test'>
<p>Heading</p>
<a href='http://www.google.com'>google</a>
www.yahoo.com
http://www.rediff.com
<a href='http://www.overflow.com'>www.overflow.com</a>
</div>
output:
www.yahoo.com
http://www.rediff.com
Kindly advise.

Use library for get dom tree html, and get all links.
for example you can use simplehtml http://simplehtmldom.sourceforge.net/
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all links
foreach($html->find('a') as $element) {
echo $element->href . '<br>';
}

Simple use this will get href source:
href='(.+?)'

Related

Detect node from html string and extract image inside in PHP

I'm locked on something that make me crazy, if you can help me, would be cool !
I have a string containing a valid HTML code. In this PHP code I would like to detect a specific image via the class of his parent, and extract the url of the image + update it with another url.
Here is an example :
$html = '....<div class="header">...<img src="theimage.png" />...</div>...';
I would like to parse the $html string, extract the url "theimage.png" and replace it by "theimage2.png" (after some internal working)
I tried to use a REGEX, but I'm not sure it's the best solution, because I need to be sure that only the image in .header will be returned, and I need to execute some functions to get the name of the future link.
Maybe the solution is to parse the HTML with DOM Node, but it didn't work too.
Can you help me please ? Thanks !!!

You can achieve this using SimplePHPDom
PHP
$html = '<body><div class="header"><p>some html</p><img src="theimage.png" /></div></body>';
$html_dom = str_get_html($html);
foreach($html_dom->find('.header img') as $element) {
echo $element->src . '<br>'; // output src of img inside .header div
}

How to grab all the site links recursively by entering domain name?

How to grab all the site links recursively by entering domain name in PHP? Please give me some idea.

For grab the all link of site you need to use Simple Html Dom. here is demo link.
http://simplehtmldom.sourceforge.net/manual.htm
Example : If you want to get all link of the website.
$html = file_get_html('http://www.example.com/'); // Create DOM from URL or file
// For Find all links
foreach($html->find('a') as $element){
echo $element->href . '<br>';
}

Not grab all links, just grab "useful" links by designing a algorithm to evaluate.And set the depth of recursion.

How can we get specific links using simple html dom

I have used this script which i found in the official simple html dom site to find hyperlinks in a website
foreach($html->find('a') as $element)
echo $element->href . '<br>';
it returned all the links found in the website but i want only specific links in that website.
is there a way of doing it in simple html dom. This is the html code for that specific links
<a class="z" href="http://www.bbc.co.uk/news/world-middle-east-16893609" target="_blank" rel="follow">middle east</a>
where this is the html tag which is different from other hyperlinks
<a class="z"
and also there is any way i can get the link text ("middle east") together with the link.

I understand you'd like all a elements with the class z? You can do that like this:
foreach($html->find('a.z') as $element)
You can get an element's value (which for links will be the link text) with the plaintext property:
$element->plaintext
Please note that this can all be found in the manual.

PHP replace text within a <h1> </h1> tag

I'm using AJAX to call a PHP file which will effectively edit particular bits of content within another HTML file. My problem is that I'm not sure of the best way of targeting these particular areas.
I figured some sort of unique identifier would need to attached to the tag that needs to be edited or in a comment perhaps, and then PHP simply searches for this before doing the replacing?

Use simplehtml for this.
You can change all <h1> to foo like this:
$html = file_get_html('http://www.google.com/');
foreach($html->find('h1') as $element)
{
$element->innertext = 'foo';
}
echo $html;

The simplehtmldom framework allows you to search and modify the DOM of a HTML file or url.
http://simplehtmldom.sourceforge.net/
// Create DOM from URL or file $html =
file_get_html('http://www.google.com/');
// Find all images foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Find all links foreach($html->find('a') as $element)
echo $element->href . '<br>';
Another nice library is querypath. It is very similar to jquery:
qp($html_code)->find('body')->text('Hello World')->writeHTML();
https://fedorahosted.org/querypath/wiki/QueryPathTutorial

DOM Manipulation with PHP

I would like to make a simple but non trivial manipulation of DOM Elements with PHP but I am lost.
Assume a page like Wikipedia where you have paragraphs and titles (<p>, <h2>). They are siblings. I would like to take both elements, in sequential order.
I have tried GetElementbyName but then you have no possibility to organize information.
I have tried DOMXPath->query() but I found it really confusing.
Just parsing something like:
<html>
<head></head>
<body>
<h2>Title1</h2>
<p>Paragraph1</p>
<p>Paragraph2</p>
<h2>Title2</h2>
<p>Paragraph3</p>
</body>
</html>
into:
Title1
Paragraph1
Paragraph2
Title2
Paragraph3
With a few bits of HTML code I do not need between all.
Thank you. I hope question does not look like homework.

I think DOMXPath->query() is the right approach. This XPath expression will return all nodes that are either a <h2> or a <p> on the same level (since you said they were siblings).
/html/body/*[name() = 'p' or name() = 'h2']
The nodes will be returned as a node list in the right order (document order). You can then construct a foreach loop over the result.

I have uased a few times simple html dom by S.C.Chen.
Perfect class for access dom elements.
Example:
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
Check it out here. simplehtmldom
May help with future projects

Try having a look at this library and corresponding project:
Simple HTML DOM
This allows you to open up an online webpage or a html page from filesystem and access its items via class names, tag names and IDs. If you are familiar with jQuery and its syntax you need no time in getting used to this library.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to fetch all the urls which are not linked using regex - php

Use library for get dom tree html, and get all links. for example you can use simplehtml http://simplehtmldom.sourceforge.net/ // Create DOM from URL or file $html = file_get_html('http://www.google.com/'); // Find all links foreach($html->find('a') as $element) { echo $element->href . '<br>'; }

Simple use this will get href source: href='(.+?)'

Related

Detect node from html string and extract image inside in PHP

How to grab all the site links recursively by entering domain name?

How can we get specific links using simple html dom

PHP replace text within a <h1> </h1> tag

DOM Manipulation with PHP

Categories

Resources