PHP RegEx Negation Word

PHP RegEx Negation Word - php

My preg_replace pattern regex code here..
/<img(.*?)src="(.*?)"/
This is my replace code..
<img$1src="'.$path.'$2"
So i want to negate/exlude a condition..
If img tag have a rel="customimg", dont preg_replace so skip it..
Example: Skip This Line
<img rel="customimg" src="http..">
What might add to this regex pattern?
I searched another post, but I couldn't exactly..

Because src argument may use single or double quotes, I suggest you to use
preg_replace(
"/(<img\b(?!.*\brel=[\"']customimg[\"']).*?\bsrc=)([\"']).*?\2/i",
"$1$2" . $path . "$2",
$string);
Edit:
To add url prefix instead of full url replacement, use
preg_replace(
"/(<img\b(?!.*\brel=[\"']customimg[\"']).*?\bsrc=)([\"'])(.*?)\2/i",
"$1$2" . $path . "$3$2",
$string);

Add a negative lookahead:
/<img(?![^>]*\srel="customimg")(.*?)src="(.*?)"/

Because I only see regex "solutions" coming in. Here is the answer using DOMDocument:
<?php
$path = 'the/path';
$doc = new DOMDocument();
#$doc->loadHTML('<img rel="customimg" src="/image.jpgm"><img src="/image.jpg">');
$xpath = new DOMXPath($doc);
$imageNodes = $xpath->query('//img[not(#rel="customimg")]');
foreach ($imageNodes as $node) {
$node->setAttribute('src', $path . $node->getAttribute('src'));
}
Demo: http://codepad.viper-7.com/uID5wz

It would seem like it'd be easier/more expressive to do
if(strpos($haystackString, '"customimg"') === false) // The === is important
{
// your preg_replace here
}
Edit: Thanks for pointing out missing param guys

Related

PHP - Inner HTML recursive replace

I need to perform a recursive str_replace on a portion of HTML (with recursive I mean inner nodes first), so I wrote:
$str = //get HTML;
$pttOpen = '(\w+) *([^<]{1,100}?)';
$pttClose = '\w+';
$pttHtml = '(?:(?!(?:<x-)).+)';
while (preg_match("%<x-(?:$pttOpen)>($pttHtml)*</x-($pttClose)>%m", $str, $match)) {
list($outerHtml, $open, $attributes, $innerHtml, $close) = $match;
$newHtml = //some work....
str_replace($outerHtml, $newHtml, $str);
}
The idea is to first replace non-nested x-tags.
But it only works if innerHtml in on the same line of the opening tag (so I guess I misunderstood what the /m modifier does). I don't want to use a DOM library, because I just need simple string replacement. Any help?

Try this regex:
%<x-(?P<open>\w+)\s*(?P<attributes>[^>]*)>(?P<innerHtml>.*)</x-(?P=open)>%s
Demo
http://regex101.com/r/nA2zO5
Sample code
$str = // get HTML
$pattern = '%<x-(?P<open>\w+)\s*(?P<attributes>[^>]*)>(?P<innerHtml>.*)</x-(?P=open)>%s';
while (preg_match($pattern, $str, $matches)) {
$newHtml = sprintf('<ns:%1$s>%2$s</ns:%1$s>', $matches['open'], $matches['innerHtml']);
$str = str_replace($matches[0], $newHtml, $str);
}
echo htmlspecialchars($str);
Output
Initially, $str contained this text:
<x-foo>
sdfgsdfgsd
<x-bar>
sdfgsdfg
</x-bar>
<x-baz attr1='5'>
sdfgsdfg
</x-baz>
sdfgsdfgs
</x-foo>
It ends up with:
<ns:foo>
sdfgsdfgsd
<ns:bar>
sdfgsdfg
</ns:bar>
<ns:baz>
sdfgsdfg
</ns:baz>
sdfgsdfgs
</ns:foo>
Since, I didn't know what work is done on $newHtml, I mimic this work somehow by replacing x-with ns: and removing any attributes.

Thanks to #Alex I came up with this:
%<x-(?P<open>\w+)\s*(?P<attributes>[^>]*?)>(?P<innerHtml>((?!<x-).)*)</x-(?P=open)>%is
Without the ((?!<x-).)*) in the innerHtml pattern it won't work with nested tags (it will first match outer ones, which isn't what I wanted). This way innermost ones are matched first. Hope this helps.

I don't know exactly what kind of changes you are trying to do, however this is the way I will proceed:
$pattern = <<<'EOD'
~
<x-(?<tagName>\w++) (?<attributes>[^>]*+) >
(?<content>(?>[^<]++|<(?!/?x-))*) #by far more efficient than (?:(?!</?x-).)*
</x-\g<tagName>>
~x
EOD;
function callback($m) { // exemple function
return '<n-' . $m['tagName'] . $m['attributes'] . '>' . $m['content']
. '</n-' . $m['tagName'] . '>';
};
do {
$code = preg_replace_callback($pattern, 'callback', $code, -1, $count);
} while ($count);
echo htmlspecialchars(print_r($code, true));

Add id attribute to hyperlinks through PHP Regular Expressions

I am still relatively new to Regular Expressions and feel My code is being too greedy. I am trying to add an id attribute to existing links in a piece of code. My functions is like so:
function addClassHref($str) {
//$str = stripslashes($str);
$preg = "/<[\s]*a[\s]*href=[\s]*[\"\']?([\w.-]*)[\"\']?[^>]*>(.*?)<\/a>/i";
preg_match_all($preg, $str, $match);
foreach ($match[1] as $key => $val) {
$pattern[] = '/' . preg_quote($match[0][$key], '/') . '/';
$replace[] = "<a id='buttonRed' href='$val'>{$match[2][$key]}</a>";
}
return preg_replace($pattern, $replace, $str);
}
This adds the id tag like I want but it breaks the hyperlink. For example:
If the original code is : Link
Instead of <a id="class" href="http://www.google.com">Link</a>
It is giving
<a id="class" href="http">Link</a>
Any suggestions or thoughts?

Do not use regular expressions to parse XML or HTML.
$doc = new DOMDocument();
$doc->loadHTML($html);
$all_a = $doc->getElementsByTagName('a');
$firsta = $all_a->item(0);
$firsta->setAttribute('id', 'idvalue');
echo $doc->saveHTML($firsta);

You've got some overcomplications in your regex :)
Also, there's no need for the loop as preg_replace() will hit all the instances of the search pattern in the relevant string. The first regex below will take everything in the a tag and simply add the id attribute on at the end.
$str = 'Link' . "\n" .
'Link' . "\n" .
'Link';
$p = "{<\s*a\s*(href=[^>]*)>([^<]*)</a>}i";
$r = "<a $1 id=\"class\">$2</a>";
echo preg_replace($p, $r, $str);
If you only want to capture the href attribute you could do the following:
$p = '{<\s*a\s*href=["\']([^"\']*)["\'][^>]*>([^<]*)</a>}i';
$r = "<a href='$1' id='class'>$2</a>";

Your first subpattern ([\w.-]*) doesn't match :, thus it stops at "http".
Couldn't you just use a simple str_replace() for this? Regex seems like overkill if this is all you're doing.
$str = str_replace('<a ', '<a id="someID" ', $str);

Regex to deterime text 'http://...' but not in iframes, embeds...etc

This regex is used to replace text links with a clickable anchor tag.
#(?<!href="|">)((?:https?|ftp|nntp)://[^\s<>()]+)#i
My problem is, I don't want it to change links that are in things like <iframe src="http//... or <embed src="http://...
I tried checking for a whitespace character before it by adding \s, but that didn't work.
Or - it appears they're first checking that an href=" doesn't already exist (?) - maybe I can check for the other things too?
Any thoughts / explanations how I would do this is greatly appreciated. Main, I just need the regex - I can implement in CakePHP myself.
The actual code comes from CakePHP's Text->autoLink():
function autoLinkUrls($text, $htmlOptions = array()) {
$options = var_export($htmlOptions, true);
$text = preg_replace_callback('#(?<!href="|">)((?:https?|ftp|nntp)://[^\s<>()]+)#i', create_function('$matches',
'$Html = new HtmlHelper(); $Html->tags = $Html->loadConfig(); return $Html->link($matches[0], $matches[0],' . $options . ');'), $text);
return preg_replace_callback('#(?<!href="|">)(?<!http://|https://|ftp://|nntp://)(www\.[^\n\%\ <]+[^<\n\%\,\.\ <])(?<!\))#i',
create_function('$matches', '$Html = new HtmlHelper(); $Html->tags = $Html->loadConfig(); return $Html->link($matches[0], "http://" . $matches[0],' . $options . ');'), $text);
}

You can expand the lookbehind at the beginning of those regexes to check for src=" as well as href=", like this:
(?<!href="|src="|">)

Using regex to remove HTML tags

I need to convert
$text = 'We had <i>fun</i>. Look at this photo of Joe';
[Edit] There could be multiple links in the text.
to
$text = 'We had fun. Look at this photo (http://example.com) of Joe';
All HTML tags are to be removed and the href value from <a> tags needs to be added like above.
What would be an efficient way to solve this with regex? Any code snippet would be great.

First do a preg_replace to keep the link. You could use:
preg_replace('(.*?)', '$\2 ($\1)', $str);
Then use strip_tags which will finish off the rest of the tags.

try an xml parser to replace any tag with it's inner html and the a tags with its href attribute.
http://www.php.net/manual/en/book.domxml.php

The DOM solution:
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach($xpath->query('//a[#href]') as $node) {
$textNode = new DOMText(sprintf('%s (%s)',
$node->nodeValue, $node->getAttribute('href')));
$node->parentNode->replaceChild($textNode, $node);
}
echo strip_tags($dom->saveHTML());
and the same without XPath:
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('a') as $node) {
if($node->hasAttribute('href')) {
$textNode = new DOMText(sprintf('%s (%s)',
$node->nodeValue, $node->getAttribute('href')));
$node->parentNode->replaceChild($textNode, $node);
}
}
echo strip_tags($dom->saveHTML());
All it does is load any HTML into a DomDocument instance. In the first case it uses an XPath expression, which is kinda like SQL for XML, and gets all links with an href attribute. It then creates a text node element from the innerHTML and the href attribute and replaces the link. The second version just uses the DOM API and no Xpath.
Yes, it's a few lines more than Regex but this is clean and easy to understand and it won't give you any headaches when you need to add additional logic.

I've done things like this using variations of substring and replace. I'd probably use regex today but you wanted an alternative so:
For the <i> tags, I'd do something like:
$text = replace($text, "<i>", "");
$text = replace($text, "</i>", "");
(My php is really rusty, so replace may not be the right function name -- but the idea is what I'm sharing.)
The <a> tag is a bit more tricky. But, it can be done. You need to find the point that <a starts and that the > ends with. Then you extract the entire length and replace the closing </a>
That might go something like:
$start = strrpos( $text, "<a" );
$end = strrpos( $text, "</a>", $start );
$text = substr( $text, $start, $end );
$text = replace($text, "</a>", "");
(I don't know if this will work, again the idea is what I want to communicate. I hope the code fragments help but they probably don't work "out of the box". There are also a lot of possible bugs in the code snippets depending on your exact implementation and environment)
Reference:
strrpos - http://www.php.net/manual/en/function.strrpos.php
replace - http://www.php.net/manual/en/function.str-replace.php
substr - http://php.net/manual/en/function.substr.php

It's also very easy to do with a parser:
# available from http://simplehtmldom.sourceforge.net
include('simple_html_dom.php');
# parse and echo
$html = str_get_html('We had <i>fun</i>. Look at this photo of Joe');
$a = $html->find('a');
$a[0]->outertext = "{$a[0]->innertext} ( {$a[0]->href} )";
echo strip_tags($html);
And that produces the code you want in your test case.

PHP preg_replace weirdness with custom urls

I'm using the following code to add <span> tags behind <a> tags.
$html = preg_replace("~<a.*?href=\"$url\".*?>.*?</a>~i", "$0<span>test</span>", $html);
The code is working fine for regular links (ie. http://www.google.com/), but it will not perform a replace when the contents of $url are $link$/3/.
This is example code to show the (mis)behaviour:
<?php
$urls = array();
$urls[] = '$link$/3/';
$urls[] = 'http://www.google.com/';
$html = 'Test Link' . "\n" . 'Google';
foreach($urls as $url) {
$html = preg_replace("~<a.*?href=\"$url\".*?>.*?</a>~i", "$0<span>test</span>", $html);
}
echo $html;
?>
And this is the output it produces:
Test Link
Google<span>test</span>

$url = preg_quote($url, '~'); the dollar signs are interpreted as usual: end-of-input.

just somebody is correct; you must escape your special regex characters if you mean for them to be interpreted as literal.
It also looks to me like it can't perform the replace because it never makes a match.
Try replacing this line:
$urls[] = '$link$/3/';
With this:
$urls[] = '$link/3/';

$ is considered a special regex character and needs to be escaped. Use preg_quote() to escape $url before passing it to preg_replace().
$url = preg_quote($url, '~');

$ has special meaning in regex. End of line. Your expression is being expanded like this:
$html = preg_replace("~<a.*?href=\"$link$/3/\".*?>.*?</a>~i", "$0<span>test</span>", $html);
Which fails because it can't find "link" between two end of lines. Try escaping the $ in the $urls array:
$urls[] = '\$link\$/3/';

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP RegEx Negation Word - php

Add a negative lookahead: /<img(?![^>]\srel="customimg")(.?)src="(.*?)"/

It would seem like it'd be easier/more expressive to do if(strpos($haystackString, '"customimg"') === false) // The === is important { // your preg_replace here } Edit: Thanks for pointing out missing param guys

Related

PHP - Inner HTML recursive replace

Add id attribute to hyperlinks through PHP Regular Expressions

Regex to deterime text 'http://...' but not in iframes, embeds...etc

Using regex to remove HTML tags

PHP preg_replace weirdness with custom urls

Categories

Resources

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP RegEx Negation Word - php

Add a negative lookahead: /<img(?![^>]*\srel="customimg")(.*?)src="(.*?)"/

It would seem like it'd be easier/more expressive to do if(strpos($haystackString, '"customimg"') === false) // The === is important { // your preg_replace here } Edit: Thanks for pointing out missing param guys

Related

PHP - Inner HTML recursive replace

Add id attribute to hyperlinks through PHP Regular Expressions

Regex to deterime text 'http://...' but not in iframes, embeds...etc

Using regex to remove HTML tags

PHP preg_replace weirdness with custom urls

Categories

Resources

Add a negative lookahead: /<img(?![^>]\srel="customimg")(.?)src="(.*?)"/