PHP preg_replace weirdness with custom urls

PHP preg_replace weirdness with custom urls - php

I'm using the following code to add <span> tags behind <a> tags.
$html = preg_replace("~<a.*?href=\"$url\".*?>.*?</a>~i", "$0<span>test</span>", $html);
The code is working fine for regular links (ie. http://www.google.com/), but it will not perform a replace when the contents of $url are $link$/3/.
This is example code to show the (mis)behaviour:
<?php
$urls = array();
$urls[] = '$link$/3/';
$urls[] = 'http://www.google.com/';
$html = 'Test Link' . "\n" . 'Google';
foreach($urls as $url) {
$html = preg_replace("~<a.*?href=\"$url\".*?>.*?</a>~i", "$0<span>test</span>", $html);
}
echo $html;
?>
And this is the output it produces:
Test Link
Google<span>test</span>

$url = preg_quote($url, '~'); the dollar signs are interpreted as usual: end-of-input.

just somebody is correct; you must escape your special regex characters if you mean for them to be interpreted as literal.
It also looks to me like it can't perform the replace because it never makes a match.
Try replacing this line:
$urls[] = '$link$/3/';
With this:
$urls[] = '$link/3/';

$ is considered a special regex character and needs to be escaped. Use preg_quote() to escape $url before passing it to preg_replace().
$url = preg_quote($url, '~');

$ has special meaning in regex. End of line. Your expression is being expanded like this:
$html = preg_replace("~<a.*?href=\"$link$/3/\".*?>.*?</a>~i", "$0<span>test</span>", $html);
Which fails because it can't find "link" between two end of lines. Try escaping the $ in the $urls array:
$urls[] = '\$link\$/3/';

Related

php strip_tags to allow comment

I need to strip all html tags but retain comment lines to extract for info.
Is it even possible?
$content = strip_tags($content, '<!-->');
This doesn't work and i have tried a few different variants.

you can protect your comment before strip them using following code
// create a random string for using in replace strings
$random = strtoupper(dechex(rand(0,10000000000)));
// replace comment starts
$html = preg_replace('/<!--/', '#MARKER-START-'. $random.'#', $html);
// replace comment ends
$html = preg_replace('/-->/', '#MARKER-END-'. $random.'#', $html);
// strip all html tags
$html = strip_tags($html);
// replace back comment starts
$html = preg_replace('/#MARKER-START-'. $random.'#/', '<!--', $html);
// replace back comment ends
$html = preg_replace('/#MARKER-END-'. $random.'#/', '-->', $html);

Instead of using strip_tags() use this regular expression:
$szRetVal = preg_replace( '%</?[a-z][a-z0-9]*[^<>]*>%sim','',$szHTML );

PHP RegEx Negation Word

My preg_replace pattern regex code here..
/<img(.*?)src="(.*?)"/
This is my replace code..
<img$1src="'.$path.'$2"
So i want to negate/exlude a condition..
If img tag have a rel="customimg", dont preg_replace so skip it..
Example: Skip This Line
<img rel="customimg" src="http..">
What might add to this regex pattern?
I searched another post, but I couldn't exactly..

Because src argument may use single or double quotes, I suggest you to use
preg_replace(
"/(<img\b(?!.*\brel=[\"']customimg[\"']).*?\bsrc=)([\"']).*?\2/i",
"$1$2" . $path . "$2",
$string);
Edit:
To add url prefix instead of full url replacement, use
preg_replace(
"/(<img\b(?!.*\brel=[\"']customimg[\"']).*?\bsrc=)([\"'])(.*?)\2/i",
"$1$2" . $path . "$3$2",
$string);

Add a negative lookahead:
/<img(?![^>]*\srel="customimg")(.*?)src="(.*?)"/

Because I only see regex "solutions" coming in. Here is the answer using DOMDocument:
<?php
$path = 'the/path';
$doc = new DOMDocument();
#$doc->loadHTML('<img rel="customimg" src="/image.jpgm"><img src="/image.jpg">');
$xpath = new DOMXPath($doc);
$imageNodes = $xpath->query('//img[not(#rel="customimg")]');
foreach ($imageNodes as $node) {
$node->setAttribute('src', $path . $node->getAttribute('src'));
}
Demo: http://codepad.viper-7.com/uID5wz

It would seem like it'd be easier/more expressive to do
if(strpos($haystackString, '"customimg"') === false) // The === is important
{
// your preg_replace here
}
Edit: Thanks for pointing out missing param guys

Add id attribute to hyperlinks through PHP Regular Expressions

I am still relatively new to Regular Expressions and feel My code is being too greedy. I am trying to add an id attribute to existing links in a piece of code. My functions is like so:
function addClassHref($str) {
//$str = stripslashes($str);
$preg = "/<[\s]*a[\s]*href=[\s]*[\"\']?([\w.-]*)[\"\']?[^>]*>(.*?)<\/a>/i";
preg_match_all($preg, $str, $match);
foreach ($match[1] as $key => $val) {
$pattern[] = '/' . preg_quote($match[0][$key], '/') . '/';
$replace[] = "<a id='buttonRed' href='$val'>{$match[2][$key]}</a>";
}
return preg_replace($pattern, $replace, $str);
}
This adds the id tag like I want but it breaks the hyperlink. For example:
If the original code is : Link
Instead of <a id="class" href="http://www.google.com">Link</a>
It is giving
<a id="class" href="http">Link</a>
Any suggestions or thoughts?

Do not use regular expressions to parse XML or HTML.
$doc = new DOMDocument();
$doc->loadHTML($html);
$all_a = $doc->getElementsByTagName('a');
$firsta = $all_a->item(0);
$firsta->setAttribute('id', 'idvalue');
echo $doc->saveHTML($firsta);

You've got some overcomplications in your regex :)
Also, there's no need for the loop as preg_replace() will hit all the instances of the search pattern in the relevant string. The first regex below will take everything in the a tag and simply add the id attribute on at the end.
$str = 'Link' . "\n" .
'Link' . "\n" .
'Link';
$p = "{<\s*a\s*(href=[^>]*)>([^<]*)</a>}i";
$r = "<a $1 id=\"class\">$2</a>";
echo preg_replace($p, $r, $str);
If you only want to capture the href attribute you could do the following:
$p = '{<\s*a\s*href=["\']([^"\']*)["\'][^>]*>([^<]*)</a>}i';
$r = "<a href='$1' id='class'>$2</a>";

Your first subpattern ([\w.-]*) doesn't match :, thus it stops at "http".
Couldn't you just use a simple str_replace() for this? Regex seems like overkill if this is all you're doing.
$str = str_replace('<a ', '<a id="someID" ', $str);

PHP regular expression to remove tags in HTML document

Say I have the following text
..(content).............
<A HREF="http://foo.com/content" >blah blah blah </A>
...(continue content)...
I want to delete the link and I want to delete the tag (while keeping the text in between). How do I do this with a regular expression (since the URLs will all be different)
Much thanks

This will remove all tags:
preg_replace("/<.*?>/", "", $string);
This will remove just the <a> tags:
preg_replace("/<\\/?a(\\s+.*?>|>)/", "", $string);

Avoid regular expressions whenever you can, especially when processing xml. In this case you can use strip_tags() or simplexml, depending on your string.

<?php
//example to extract the innerText from all anchors in a string
include('simple_html_dom.php');
$html = str_get_html('<A HREF="http://foo.com/content" >blah blah blah </A><A HREF="http://foo.com/content" >blah blah blah </A>');
//print the text of each anchor
foreach($html->find('a') as $e) {
echo $e->innerText;
}
?>
See PHP Simple DOM Parser.

Not pretty but does the job:
$data = str_replace('</a>', '', $data);
$data = preg_replace('/<a[^>]+href[^>]+>/', '', $data);

strip_tags() can also be used.
Please see examples here.

$pattern = '/href="([^"]*)"/';

I use this to replace the anchors with a text string...
function replaceAnchorsWithText($data) {
$regex = '/(<a\s*'; // Start of anchor tag
$regex .= '(.*?)\s*'; // Any attributes or spaces that may or may not exist
$regex .= 'href=[\'"]+?\s*(?P<link>\S+)\s*[\'"]+?'; // Grab the link
$regex .= '\s*(.*?)\s*>\s*'; // Any attributes or spaces that may or may not exist before closing tag
$regex .= '(?P<name>\S+)'; // Grab the name
$regex .= '\s*<\/a>)/i'; // Any number of spaces between the closing anchor tag (case insensitive)
if (is_array($data)) {
// This is what will replace the link (modify to you liking)
$data = "{$data['name']}({$data['link']})";
}
return preg_replace_callback($regex, array('self', 'replaceAnchorsWithText'), $data);
}

use str_replace

PHP: STR replace by link

i have this PHP chatbox.
If i would type a link in the chatbox, it would not display it as a link.
How can i use STR replace to do this?
It should respond to stuff like 'http' 'http://' '.com' '.nl' 'www' 'www.' ....
My other STR replace lines look like these:
$bericht = str_replace ("STRING1","STRINGREPLACEMENT1",$bericht);
Someone?

Hey! Try this code (found at php.net somewhere):
function format_urldetect( $text )
{
$tag = " rel=\"nofollow\"";
//
// First, look for strings beginning with http:// that AREN't preceded by an <a href tag
//
$text = preg_replace( "/(?<!\\0", $text );
//
// Second, look for strings with casual urls (www.something.com...) and make sure they don't have a href tag OR a http:// in front,
// since that would have been caught in the previous step.
//
$text = preg_replace( "/(?<!\\0", $te
xt );
$text = preg_replace( "/(?<!\\0", $t
ext );
return $text;
}
Uhm, broken indentation. try this http://triop.se/code.txt

You should use regex instead. look at preg_replace
$regex = '`(\s|\A)(http|https|ftp|ftps)://(.*?)(\s|\n|[,.?!](\s|\n)|$)`ism';
$replace = '$1$2://$3$4'
$buffer = preg_replace($regex,$replace,$buffer);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP preg_replace weirdness with custom urls - php

$url = preg_quote($url, '~'); the dollar signs are interpreted as usual: end-of-input.

just somebody is correct; you must escape your special regex characters if you mean for them to be interpreted as literal. It also looks to me like it can't perform the replace because it never makes a match. Try replacing this line: $urls[] = '$link$/3/'; With this: $urls[] = '$link/3/';

$ is considered a special regex character and needs to be escaped. Use preg_quote() to escape $url before passing it to preg_replace(). $url = preg_quote($url, '~');

Related

php strip_tags to allow comment

PHP RegEx Negation Word

Add id attribute to hyperlinks through PHP Regular Expressions

PHP regular expression to remove tags in HTML document

PHP: STR replace by link

Categories

Resources