Adding negative lookback to this regex pattern in php - php

I have spent the entire day trying to figure out how to get this code to only affect the first instance it runs across. Eventually, I learned about a negative lookback and tried to implement that.
I have tried every possible arrangement except, of course, the correct one.I discovered regex101, which is really cool, but ultimately didn’t help me find the solution.
$content = preg_replace('/<img[^>]+./','', get_the_content_with_format());
This will be used in wordpress to strip out the first image on a page (moving it above the written content), but leave the rest in so that there can be images used in the post description.
Be easy on me, please. This is my first question here and I really am not a programmer.
Update: Because l’L'l asked, this is the entire chunk of relevant code.
<?php
//this will remove the images from the content editor
// it will not remove links from images, so if an image has a link, you will end up with an empty line.
$content = preg_replace('/<img[^>]+./','', get_the_content_with_format());
//this IF statement checks if $content has any value left after the images were removed
// If so, it will echo the div below it.. if not will won't do anything.
if($content != ""):?>
<div class="portfolio-box">
<?php echo do_shortcode( $content ) ?>
</div>
<?php endif; ?>
I’ve tried both of the solutions offered here but, for whatever reason, they didn’t work.
And, thank you guys very much for helping, by the way.

You could just anchor it at the beginning of the string (with ^), capture everything up to the first image (with (.*?)), and replace all of that with the content before the image:
$content = preg_replace('/^(.*?)<img[^>]+/s','$1', get_the_content_with_format());
Note I also added the modifier s so that dot (.) matches newlines.

If you just want to replace the first occurence of the regex match, just add "1" as fourth parameter, which indicates, that only one match will be replaced.
See http://php.net/manual/de/function.preg-replace.php
In your example, this would look like:
$content = preg_replace('/<img[^>]+./','', get_the_content_with_format(), 1);

Related

Replace only images and tags with surgical precision that match an img list var

OK my goal is to remove all images and their tags that I specify in an array or group, it should remove the entire image and tags and work if its contained in a link or not.
so far I got this working somewhat but far from perfect this version only removes images not in an href tag, i need it to work both ways.
so if we have <img src="test1.gif" width="235"> it must remove that even if it contains other code and even if its surrounded by a link as long as the image name matches.
So any images contained in the group must be completely removed with there tags and or links that wrap that image contained in my var.
This is what I have so far.
#<img[^>]+src=".*?(test1.gif|test2.png|test3.jpg)"[^>]+>?#i
Ultimately what I am trying to do is not as simple as I hoped so I am hopping some regex guru's can help with this task as I cant find anything on here or the net most are just replacing all images on a page not specific images. Not my reason for it needing to be a Regex is because this must work in other code that's based around preg_replace and yes, I know thats not the best way to do it.
UPDATED added this as example sorry for any confusion.
This all PHP Based!
So this var will have all the images that we need to replace. with nothing.
$m_rimg = "imagewatever.gif|test.jpg|animage.png";
preg_replace('#<img[^>]+src=".*?('.$m_rimg.')"[^>]+>?#i','');
This almost works but not correctly as it must also remove images wrapped in a link href tag and remove the image along with the link if it has one. so basically I need what I have modified to work correctly with <img src="whatever.gif" width=""> or <img src="whatever.gif" width=""> but it must only replace or remove the images that match in the var list not just replacing all images, that are images ... that I can do this is more complex.
I hope this better explains it.
UPDATED 04/25/15
Ok I tried the last one that was added to test it out info below.
I had to mod it with some \ so i did not get parse error so for anyone looking to do something similar to my needs.
This worked great. I just modded what you gave me like this.
"#(?:<a\b[^>]*?>)?(<img[^>]+src=[\"'][^>]*?($m_rimg)['\"][^>]*>)(?:<\/a>)?#is"
and did not use preg_quote, not sure why but that did not work at all but without preg_quote it works so far in some tests i just did.
I was told to not use | but that is what seems to work how else would you guys suggest?
As to this being a duplicate of another answered question flagged by some, I do not think that's the case as I looked at what is said to be the answer to my question as well and it is not the same that I see at all, and is not doing the exact thing I need to do match whats in my var. while yes it is Regex related it did not help, I tried to find something on here that worked for my needs, way before ever posting.
I got a helpful answer to my problem from one user, who understood why I was doing it this way. I hope this is now acceptable to lift he dupe status as my goal was not to offend those who don't think I should use a Regex as part of an HTML parser script.
Try something like:
$DOM = new DOMDocument();
$DOM->loadXML('HTML_DOCUMENT');
$list = $DOM->getElementsByTagName('img');
foreach($list as $img){
$src = $img->getAttribute('src');
//only match if src contains `test1.gif`:
if(stringEndsWith($src, 'test1.gif') ||
stringEndsWith($src, 'test2.gif') ||
stringEndsWith($src, 'test3.gif')) {
$list->removeChild($img);
}
}
function stringEndsWith($haystack, $ending, $caseInsensitivity = false)
{
if ($caseInsensitivity)
return strcasecmp(substr($haystack, strlen($haystack) - strlen($ending)), $haystack) === 0;
else
return strpos($haystack, $ending, strlen($haystack) - strlen($ending)) !== false;
}
Or as you state you still need a regex way to remove <img> tags based on the alternative list inside a $m_rimg variable, and any <a> tags wrapped around, so use this:
$re = "#(?:<a\b[^>]*?>)?(<img[^>]+src=["'][^>]*?('.$m_rimg.')['"][^>]*>)(?:<\/a>)?#is";
$str = "<img\n att=\"value\"\n src=\"sometext3456..,gjyg&&&test1.gif\" />\n\n<img src=\"imagewatever.gif\">";
$result = preg_replace($re, "", $str);
Mind that all the items in your variable must be preg_quoted, but not the | symbols.
Demo

Trying to stop regex at a tag

I know there are other posts with a similar name but I've looked through them and they haven't helped me resolve this.
I'm trying to get my head around regex and preg_match. I am going through a body of text and each time a link exists I want it to be extracted. I'm currently using the following:
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
which works fine until it finds one that has <br after it. Then I get the url plus the <br which means it doesn't work correctly. How can I have it so that it stops at the < without including it?
Also, I have been looking everywhere for a clear explanation of using regex and I'm still confused by it. Has anyone any good guides on it for future reference?
\S* is too broad. In particular, I could inject into your code with a URL like:
http://hax.hax/"><script>alert('HAAAAAAAX!');</script>
You should only allow characters that are allowed in URLs:
[-A-Za-z0-9._~:/?#[]#!$&'()*+,;=]*
Some of these characters are only allowed in specific places (such as ?) so if you want better validation you will need more cleverness
Instead of \S exclude the open tag char from the class:
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/[^<]*)?/";
You might even want to be more restrictive by only allowing characters valid in URLs:
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/[a-zA-Z_\-\.%\?&]*)?/";
(or some more characters)
You could use this one as presented on the:
http://regex101.com/r/zV1uI7
On the bottom of the site you got it explained step by step.

Return value as 'customvariablename'

I am attempting to crawl through a file and insert "../../" at the beginning of every image path. Unfortunately though, the script is timing out, and since it only took a few seconds to run before this was added something tells me it is not doing what I think it should be. This is how I'm doing it:
$filedata = substr_replace(substr($filedata,$imageBeginning,1),"../../",$imageBeginning);
I am crawling entire HTML files to accomplish this, so I need an efficient solution. Any help is appreciated.
This is completely untested, but something like this:
preg_replace('/(<img\s+.*?src=")(.*\\.(?:jpg|png|bmp|gif).*?>)/', '$1../../$2', $filedata);
Explanation: You are making 2 captures in the regular expression. The first is everything from the start of the img tag to the start of the src attribute. The second is the src attribute value and everything after it. Then you just insert "../../" in the middle in the replacement.
http://php.net/manual/en/function.preg-replace.php

PHP Regex Help / Explanation

Up until now I have successfully managed to avoid doing very much with regular expressions apart from checking an email address is valid. However, as part of a larger university project I'm developing a simple tempting engine and am trying to implement my own simple syntax for handling loops, and eventually IF statements rather than including PHP in my template files. I know a lot of people will say don't bother or just use an existing system, but as it's for my dissertation I want as much of it to be my own work as possible!
Anyway, back to the problem. I've got the following code in a my template file as an example:
<p>Template Header</p>
{{foreach{array1}}}
<p>This is the first content line that should be displayed.</p>
{{/foreach}}
{{foreach{array2}}}
<p>This is the second content line that should be displayed.</p>
{{/foreach}}
<p>Template Footer</p>
I've then got the following PHP to read the file, look for loops and extract them.
<?php
$template = file_get_contents('reg.html');
$expression = "#.*{{foreach{(.*?)}}}(.*?){{/foreach}}.*#is";
$result = preg_replace($expression, "$1", $template);
var_dump($result);
?>
When calling preg_replace and dumping the result $1 is giving me the array name which will be used for the loop (array1 or array2), then changing it to $2 will give me the content between the loop tags. Perfect. The problem is it only works for one {{foreach}} tag.
Is there anyway I can loop all matches of the regex to get the results I'm getting above? ANy help / advice is much appreciated - but go easy regex is pretty new to me!
$expression = "#.*{{foreach{(.*?)}}}(.*?){{/foreach}}.*#is";
^^ ^^
You are not just matching the "foreach" template tag, but also everything before and after it. That means the second foreach will get eaten up by .* too. So you can't match it again.

"catching" links in regex using php ignoring inline js

I'm stuck trying to make a regex in PHP that catches the link and its content from a html page (which I have no control over) and replaces it with a link of mine.
i.e.:
<a style="position:absolute;more_styles:more;" href="http://www.google.co.il/" class="something">This is the content</a>
Becomes:
<a style="position:absolute;more_styles:more;" href="my_function('http://www.google.co.il/')" class="something">This is the content</a>
This is the regex that I wrote:
$content = preg_replace('|<a(.*?)href=[\"\'](.*?)[\"\'][^>]*>(.*?)</a>|i','$3',$content);
This works well with all the links except links like:
<a href="http://google.co.il" onclick="if(MSIE_VER()>=4){this.style.behavior='url(#default#homepage)';this.setHomePage('http://www.google.co.il')}" class='brightgrey rightbar' style='font-size:12px'><b>Make me the home page!</b></a>
Obviously, the regexp stops at "MSIE_VER()>" because of the "[^>]*" part and i get the wrong content when I use "$3".
I tried almost every option to make this work but no luck.
Any thoughts?
Thank you all in advance..
First of all your code is trying to do something different that to add my_function - it tries to remove the starting tag and replace it with url only. There are several ways to acheieve your declared goal (i.e. substituing my_function to all hrefs) , the most pragmafic would be:
$content = preg_replace('|href=[\"\'](.*?)[\"\']|i',"href=\"my_function('$1')\"",$content);
if you need more prudent approach than I would use
$content = preg_replace('|(<a.*?)href=[\"\'](.*?)[\"\'](.*?</a>)|i',"$1href=\"my_function('$2')\"$3",$content);
last but not least if you need removing tag rather than what you have written, let me know there is million ways to do it.
By default .* will take evryting it can - eg. it takes onclick argument, because regex is still valid - replace "." with [^\"] - it will tell regexp to take evrything excluding " ( which cannot be in URL )
$content = preg_replace('|<a(.*?)href=[\"\']([^"]*?)[\"\'][^>]*>(.*?)</a>|i','$3',$content);

Categories