Replace img bbcode with optional width and height using regex? - php

I really don't get the right solution.
My standard img replace code is:
preg_replace('~\[img](.*?)\[/img\]~s','<img src="$1" />',$text);
Of course it works. But im trying to replace the bbcode if width and height is set. But thats optional, so it should work also if only 1 dimension is set or nothing.
The bbcode looks like: [img=12x12]link of the image[/img]
So the bbcode should look like:
preg_replace('~\[img=(.*?)x(.*?)\](.*?)\[/img\]~s','<img width="$1" height="$2" src="$3" />',$text);
I guess I got it wrong. Anybidy knows how to solve this?

Try this regex:
preg_replace('~\[img=?(\d+)?x?(\d+)?\](.*?)\[/img\]~s','<img width="$1" height="$2" src="$3" />',$text);
The way you coded it, it wouldn't match all 3 cases you wanted: [img], [img=NN], and [img=NNxNN]. It would only match in the case both dimensions were provided.

Your regexp should definitely work. I would have used \d+ though which makes sure the value exists and are of numeric type:
~\[img=(\d+)x(\d+)\](.*?)\[/img\]~s
What error are you getting with your code, or rather, what string are you expecting to match but you don't?

Related

extracting only image src, not other 'src' tags in html with php

I've been able to use preg_match on getting the src of any image tags, but I only really need the src of images with class 'wp-post-image' in this case. However, this code is returning nothing for me
$pattern = '<img(?:[^>]+src="(.+?)"[^>]+(?:id|class)="image"|[^>]+(?:id|class)="wp-post-image"[^>]+src="(.+?)")
';
preg_match($pattern,$results[$k]['description'], $matches);
$results[$k]['image'] = $matches[0];
print_r($results[$k]['image']);
The old version returns all image matches which includes 4 that have the class I'm looking for so maybe my syntax is just wrong?
old version that returned all images:
$pattern = '%<img.*?src=["\'](.*?)["\'].*?/>%i';
preg_match($pattern,$results[$k]['description'], $matches);
$src = $matches[0];
//print_r($src);
Asking to parse HTML with regex on SO will get you flamed. Not without reason, but flamed nonetheless.
If you insist on using regex (which, if for nothing else, is good practice), I suggest using a regex sandbox to test out patterns on sample text. One I use is https://regex101.com/ .
The old version (which you say worked) is looking for either single or double quotes around the src attribute. The new version is only looking for double quotes, which is possibly why it's failing.
Rather than trying to write a more complicated regex, it may be easier to use your old regex -- which grabs all the image links -- along with an expanded capture, and then look through the captured links to sort out the ones you need:
$pattern = '%(<img.*?src=["\'].*?["\'].*?/>)%i';

Regex replace if caption does not contain character in PHP

I'm finishing BBCode support for my CMS. I'm using regex to convert the BBCode to html and vice versa. But yet, i have a little problem with security. I have for example regular expression:
~\[img=(.*?\.(?:jpg|jpeg|png))\|(.*?)\[/img\]~s
to determinate for example this
[img=somewhere.com\image\08-09-2014\cat.png|This is a cat[/img]
But it also works on strings like this, but that I really don't want to.
[img=somewhere.com" onclick="someBadJSCode()" src="\image\08-09-2014\cat.png|This is a cat[/img]
I thought that this edit to regex will help:
~\[img=([^"]+.*?\.(?:jpg|jpeg|png))\|(.*?)\[/img\]~s
But it actually didn't, dunno why. Any ideas?
Thanks to Casimir et Hippolyte, i got the regex, that works for the url, alt and class part without any JS danger.
Converts this:
[img=somewhere.com/img.jpg|left]cat[/img]
to this:
<img src="somewhere.com/img.jpg" class="left" alt="cat" >
Pattern for preg_replace method (1. parameter)
~\[img=([^"']+\.(?:jpg|jpeg|png))\|([^"']+)\]([^"']+)\[/img\]~s
Replacement for preg_replace method (2. parameter)
'<img src="$1" class="'.$this->GetElementClass('img').' $2" alt="$3" >'

What regex can I add to my script that would allow users to have optional parameters in their bb codes?

I am writing a custom bb code script in PHP. I would like to be able to give users the option to add different parameters such as width, height, and align to img tags. The script I have works fine if you don't want to alter those attributes... Once I have the regex in place, how would I access those parameters since they would not always correspond to the same numbers ($2) since they are optional?
"'\[img\](.*?)\[\/img\]'is"
A bbcode such as this would look something like this:
[img width=100 height=100 align=right]thelinktotheimage[/img]
Update: I tried to write my own regex for this... basically is supposed to only allow width, height, or align as attributes, the = symbol, and then numeric characters or the string patterns left, right, middle, top, or bottom as their values. For some reason, the regex doesn't match my test string. I have a feeling that I am very close... I hope. Any ideas?
^\[img(((width|height|align)=(([0-9]+)|(left|right|middle|top|bottom)) )+)\](.*?)\[\/img\]$
It would be difficult to parse a completely arbitrary number of attributes with just regex, but if you only have a maximum of 3 (or some other reasonably small number), you could use something like this:
\[img (?:([a-zA-Z]+)="([a-zA-Z\d]+)")? ?(?:([a-zA-Z]+)="([a-zA-Z\d]+)")? ?(?:([a-zA-Z]+)="([a-zA-Z\d]+)")? ?].+\[\/img]
Here's a link that shows the matches you'll get. As you can see, odd numbered captures are the attribute names while evens are the values. Just delete the double quotes if you're not using them.
http://regex101.com/r/bN1jT3
So, you could get few preg_match_all() functions but it would get worse with more parameters. What you can do is e.g. use something like this:
<?php
$bb = "[img width=100 height=100 align=right]thelinktotheimage[/img]<br />[img width=100 height=100 align=right]thelinktotheimage[/img]";
preg_match_all("'\[img(.{0,40})\](.*?)\[\/img\]'is", $bb, $matches);
print_r($matches);
?>

Replace anchor text with PHP (and regular expression)

I have a string that contains a lot of links and I would like to adjust them before they are printed to screen:
I have something like the following:
replace_this
and would like to end up with something like this
replace this
Normally I would just use something like:
echo str_replace("_"," ",$url);
In in this case I can't do that as the URL contains underscores so it breaks my links, the thought was that I could use regular expression to get around this.
Any ideas?
Here's the regex: <a(.+?)>.+?<\/a>.
What I'm doing is preserving the important dynamic stuff within the anchor tag, and and replacing it with the following function:
preg_replace('/<a(.+?)>.+?<\/a>/i',"<a$1>REPLACE</a>",$url);
This will cover most cases, but I suggest you review to make sure that nothing unexpected was missed or changed.
pattern = "/_(?=[^>]*<)/";
preg_replace($pattern,"",$url);
You can use this regular expression
(>(.*)<\s*/)
along with preg_replace_callback .
EDIT :
$replaced_text = preg_replace_callback('~(>(.*)<\s*/)~g','uscore_replace', $text);
function uscore_replace($matches){
return str_replace('_','',$matches[1]); //try this with 1 as index if it fails try 0, I am not entirely sure
}

can't make a preg_match right !

I have this link inside an HTML page.
<img id="catImage" width="250" alt="" src="http://dev-server2/image2.png" />
I want to get the value of src and am not getting along with preg_match and all of this regex stuff. Is this one right?
preg_match(
"/<img id=\"catImage\" width=\"[0-9]+\" alt=\"\" src=\"([[a-zA-Z0-9]\/-._]*)\"/",
$artist_page["content"], $matches);
I get an empty array!
First and foremost, the portion of your regex that deals with the src attribute doesn't account for the colon that appears in the URL.
I'd suggest changing the src portion (and any other attribute values) to look instead for the close quote and capture everything between:
... src=\"([^\"]*)\" ....
Does this work?
'/<img id="catImage"[^>]+src="([^"]*)"/'
I'm still really new on regex but I thought I would throw my thoughts out there and get some criticism for it. Should the expression be something like (?<=(src=")).*(?=["])? (not quite PHP formatted, yet). This would grab the contents of the src attribute.
"/<img id=\"catImage\" width=\"[0-9]+\" alt=\"\" src=\"([a-zA-Z0-9/.:_-]*)\"/"
Should do. Note that I edited the range [ ... ] part. The hyphen (-) has a special meaning so I put it last to add it as a literal in the range. Also, I added the : char (thanks #user333699). This hints, however, that you should not try and think of any valid URL character. Instead, match anything until you know that the entire value of the src attribute is matched:
"/<img id=\"catImage\" width=\"[0-9]+\" alt=\"\" src=\"([^\"]*)\"/"
I.e., anything that is not a quote (").
Note that in order to get the value of src you'll have to perform additional computation after the preg_match, as your match is going to return the entire tag.
It might be worth diving into XPath, depending on what you really want to do with it.

Categories