Only match the href="http inside the reges - php

I'm using the following regex to select the href="http part inside an url which doesn't contain a rel="nofollow" yet:
preg_replace(
"/<a\b(?=[^>]+\b(href=\"http))(?![^>]+\brel=\"nofollow\")/',
"rel=\"nofollow\" href=\"http://",
$input_string
);
The thing is it only replaces the <a because that's the first match.
How is it possible to select the a tag but exclude the <a part from the results so it only will match href="http? Because preg_match does return <a AND href="http, but I only need href="http :)
The reason I think this might be the only right solution is because it's not sure how many <a> tag the given string contains and whether they contain a rel=nofollow or not. I need to make sure I only replace the http:// with rel="nofollow" http:// inside <a> tags with no rel="nofollow"
EDIT 1:
giuseppe straziota asked for an input and output example so here it is:
input:
this is a string with a lot of content and links and whatever....
output:
this is a string with a lot of content and <a rel="nofollow" href="http://information.nl" class="aClass">links</a> and whatever....
EDIT 2:
I run a couple of more tests, these are the results:
code (exact copy/paste):
$input_string = 'this is a string with a lot of content and links and whatever....';
$input_string = preg_replace(
'/<a\b(?=[^>]+\b(href="http))(?![^>]+\brel="nofollow")/',
'rel="nofollow" href="http://',
$input_string
);
echo htmlentities($input_string);
result from php 7.0.5:
this is a string with a lot of content and rel="nofollow" href="http:// href="http://information.nl" class="aClass">links</a> and whatever....
And it should be:
this is a string with a lot of content and <a rel="nofollow" href="http://information.nl" class="aClass">links</a> and whatever....
EDIT 3:
I tried this regex:
$test = preg_replace(
'/(?=<a\b[^>]+\b(href="http))(?![^>]+\brel="nofollow")/',
'rel="nofollow" href="http://',
$input_string
);
But now it places the 'rel="nofollow" href="http://', right before the <a, so the result:
rel="nofollow" href="http://links
Not exactly what I want either...

I'm thinking too difficult, I made some adaptions in my preg_replace so I can just use the first regex:
$test = preg_replace(
'/<a(?=\b[^>]+\b(href="http))(?![^>]+\brel="nofollow")/',
'<a rel="nofollow"',
$input_string
);
It replaces the <a tag, so I should have taken advantage of that like I do now.

Related

PHP can't take <img> tag from page

I have a problem with PHP preg_match function.
In CMS DLE, I try to extract a picture from the news (image-x), but in the module I'm referring to via a direct link.
//remove <p></p> tags
$row[$i]['short_story'] = str_replace( "</p><p>", " ",$row[$i]['short_story'] );
//remove the \" escapes (DLE put it in the MySQL column)
$row[$i]['short_story'] = str_replace("\\\"", " ", $row[$i]['short_story']);
//remove all tags except <img>, but there remains a simple text that is stored without tags
$row[$i]['img'] = strip_tags($row[$i]['short_story'], "<img>");
//try to find <img> (by '>'), to remove the simple text;
preg_match(".*>", $row[$i]['img'], $matches);
// print only <br/> (matches is empty)
print_r($matches."<br/>\n");
for example print_r($row[$i]['img']) is
<img src="somelink" class="fr-fic" fr-dib="" alt=""> Some text
And i need only
<img src="somelink" class="fr-fic" fr-dib="" alt="">
Your regex pattern to selecting <img> is incorrect. Use /<img[^>]+>/ in pattern instead. The code should change to
preg_match("/<img[^>]+>/", $row[$i]['img'], $matches);
Also you can use preg_replace() to removing additional text after <img>
preg_replace("/(<img[^>]+>)[\w\s]+/", "$1", $string)

Remove special characters like lt; but not anchor tag

How can I remove special characters like ;lt ;gt but not Anchor tag
e.g
&lt;a href=&quot;http://www.imdb.com/name/nm0005069/&quot;&gt;Spike Jonze&lt;/a&gt; This cause by <a class="primary-black" href="http://example.com/community/RobHallums">RobHallums</a>
should be
Spike Jonze This cause by <a class="primary-black" href="http://example.com/community/RobHallums">RobHallums</a>
Here's a quick one for you:
<?php
// SET OUR DEFAULT STRING
$string = '&lt;a href=&quot;http://w...content-available-to-author-only...b.com/name/nm0005069/&quot;&gt;Spike Jonze&lt;/a&gt; This cause by <a class="primary-black" href="http://e...content-available-to-author-only...e.com/community/RobHallums">RobHallums</a>';
// USE PREG_REPLACE TO STRIP OUT THE STUFF WE DON'T WANT
$string = preg_replace('~&lt;.*?&gt;~', '', $string);
// PRINT OUT OUR NEW STRING
print $string;
All I'm doing here is looking for &lt;, followed by any character ., any number of times *, until it matches the next part of the string ?, which is &gt;.
Any time it finds that, it replaces it with nothing. So you're left with the text you want.
Here is a working demo:
http://ideone.com/uSnY0b
use html_entity_decode:
<?php $url = html_entity_decode('&lt;a href=&quot;http://www.imdb.com/name/nm0005069/&quot;&gt;Spike Jonze&lt;/a&gt;');
echo $url;
?>
the output will be:
Spike Jonze
EDIT:
<?php
preg_match_all('/<a .*?>(.*?)<\/a>/',$url,$matches);
//For Text Name
echo $matches[1][0]; //output : Spike Jonze
?>

php Replace <a href=" or <a href=' with another URL

I'm stuck as preg_matching is not always that easy as I'm totally not familiar with it.
I'm trying to replace all the
In example:
Site1 => Site1
But the <a href can be written in many ways; a HREF or A href or double spaced <A href etc... How can i manage this. Bear in mind, performance is key
I've tried the following with str_replace, but of course that does not cover all the <a href (capital non capitalized versions).
$str = 'sitename1<br />sitename2<br />sitename3';
$Replace = str_replace('<a href="', '<a href="https://example.com/&i=1243123&r=', $str);
echo $Replace
Try this (PHP 5.3+):
$link = preg_replace_callback('#<a(.*?)href="(.*?)"(.*?)>#is', function ($match) {
return sprintf(
'<a%shref="%s"%s>',
$match[1],
'http://example.com?u=' . urlencode($match[2]),
$match[3]
);
}, 'Site1');
echo $link;
The only fully reliable way of doing this is to use a proper HTML parser.
Happily, PHP has one built-in.
You'd first load the HTML with DomDocument's loadHTML function: http://php.net/manual/en/domdocument.loadhtml.php
Then search the parsed tree with XPath and manipulate the A tags: http://php.net/manual/en/domxpath.query.php

PHP cut text from a specific word in an HTML string

I would like to cut every text ( image alt included ) in an HTML string form a specific word.
for example this is the string:
<?php
$string = '<div><img src="img.jpg" alt="cut this text form here" />cut this text form here</div>';
?>
and this is what I would like to output
<div>
<a href="#">
<img src="img.jpg" alt="cut this text" />
cut this text
</a>
</div>
The $string is actually an element of an Object but I didn't wanted to put too long code here.
Obviously I can't use explode because that would kill the HTML markup.
And also str_replace or substr is out because the length before or after the word where it needs to be cut is not constant.
So what can I do to achive this?
Ok I solved my problem and I only post an answer to my question because it could help someone.
so this is what I did:
<?php
$string = '<div><img src="img.jpg" alt="cut this text form here" />cut this text form here</div>';
$txt_only = strip_tags($string);
$explode = explode(' from', $txt_only);
$find_txt = array(' from', $explode[1]);
$new_str = str_replace($find_txt, '', $string);
echo $new_str;
?>
This might not be the best solution but it was quick and did not involve DOM Parse.
If anybody wants to try this make sure that your href or src or any ather attribute what needs to be untouched doesn't have any of the chars in the same way and order as in $find_txt else it will replace those too.

How work with preg_replace to replace with excluded pattern

I have some text contain html tags, I would like to replace all links with other one, but I want to replace just local links, not they start with http://
example :
test link
==> test link
Video
==> Video
I try this preg_replace but not working :
$exclude = '<a href=\"http://.*?';
$pattern = '<a href=\".*?';
$content=preg_replace("~(($exclude)?($pattern))~i",'<a href="/action.php?url=$4',$content);
Thanks!
What about something like this:
$content = preg_replace('#<a href="([^:]*)">#i', '<a href="/action.php?url=$1">', $content);

Categories