I just need one line of Regex code edited - php

the following code works but how can it be edited to only detect the first curly brace after an end parenthesis?
'/\{(([^{}]*|(?R))*)\}/'
Example:
if (1==1)
{
echo "testing {$username}";
}
The problem is that it detects ALL curly brackets, even the one surrounding the $username variable. So I think a solution would be to detect is there is a ) before the first curly bracket. I tried about 20 different things myself but cannot get it to work. How can it be edited to only detect ) { Oh and please add code if there are spaces and tabs involved inbetween the first curly bracket and end parenthesis if that matters. Thanks.

You can use:
\)\s*\{
as part of your pattern to detect simple cases like your example. Note that you can't use a positive lookbehind of variable length though, so you can't alter it to (?<=)\s*){
My pattern will still pick up code that is commented out though, and won't detect code that has a comment between the ) and the {. You wouldn't want to use a regex to try to detect such cases.

Related

Regular Expression for anything enclosed in double squar bracket

I have been trying to extract some strings or any kinds of values which are enclosed in double squar bracket. i.e. [[hello world]] or [[12_ nine]] etc. That's mean anything but which are enclosed in two squar bracket. Please check this URL, where I tried. However, I am explaining below what I did:
/\[[^\]]*\]]/
This pattern can validate anything inside [[]]. My problem is, it validate also []]. I am giving two examples what this parttern validate [[Welcome]] [v2.0]]. I need second example should not be validated. See the URL, you can understand better.
You need this:
/\[\[[^\]]*\]\]/
Here's how it's defined:
First two (escaped) brackets: \[\[
Then something that's not brackets: [^\]]*
Then two closing brackets: \]\] (you could keep them unescaped too, it's a matter of style)
Note that it won't match strings having square brackets in the middle (e.g. [[ A [B] C]]). If you want to allow those strings, use
/\[\[.*?\]\]/
If that must be the whole string, as seems to be from your comment below, use
/^\[\[.*?\]\]$/ (if you want to allow brackets inside)
/^\[\[[^\]]*\]\]$/ (if you don't)
^ an $ are end of string anchors and let you test the whole string.
Try this regex
\[{2}[^\[\]]+\]{2}
try this Demo
Explanation
\[{2} I want exactly [[
\[{2} I want exactly ]]
[^\[\]]+ I want anything that is not [ or ] repeated one or more times
and if you want to catch only between two brackets
(?<=[^\[]\[{2})[^\[\]]+(?=\]{2}(?!\]))
try this Demo

Parse for square brackets with regular expressions

I've always had a difficult time with regular expressions. I've searched for help with this, but I can't quite find what I'm looking for.
I have blocks of text that follow this pattern:
[php]
... any type of code sample here
[/php]
I need to:
check for the square brackets, which can contain any number of 20-30 programming language names (php, ruby, etc.).
need to grab all code in between the opening and closing bracket.
I have worked out the following regular expression:
#\[([a-z]+)\]([^\[/]*)\[/([a-z]+)\]#i
Which matches everything pretty well. However, it breaks when the code sample contains square brackets. How do I modify it so that any character between those opening/closing braces will be matched for later use?
This is the regex you want. It matches where the tags are even too, so a php tag will only end a php tag.
/\[(\w+)\](.*?)\[\/\1\]/s
Or if you wanted to explicitly match the tags you could use...
$langs = array('php', 'python', ...);
$langs = implode('|', array_map('preg_quote', $langs));
preg_match_all('/\[(' . $langs . ')\](.*?)\[\/\1\]/s', $str, $matches);
The following will work:
\[([a-z]+)\].*\[/\1\]
If you don't want to remove the greediness, you can do:
\[([a-z]+)\].*?\[/\1\]
All you have to do is to check that both the closing and opening tags have the same text (in this case, that both are the same programming language), and you do that with \1, telling it to match the previously matched Group number 1: ([a-z]+)
Why don't you use something like below:
\[php\].*?\[/php\]
I don't understand why you want to use [a-z]+ for the tags, there should be php or a limited amount of other tags. Just keep it simple.
Actually you can use:
\[(php)\].*?\[/(\1)\]
so that you can match the opening and closing tags. Otherwise you will be matching random opening and closing. Add others like, I don't know, js etc as php|js etc.
Use a backreference to refer to a match already made in the regular expression:
\[(\w+)\].*?\[/\1\]

PHP / Regex : match json inside json

Just a quick regex question...hopefully
I have a string that looks something like this:
$string = 'some text [ something {"index":"{"index2":"value2"}"}] [something2 {"here to be":"more specific"}]';
I want to be able to get the value:
{"index":"{"index2":"value2"}"}
But all my attempts at matching (or replacing) keep giving me:
{"index":"{"index2":"value2"}
preg_replace('/\[(.*?)({.*?[^}]})*?\]/is', "", $string);
Here I'm matching the whole square bracket area, but hopefully you can see what I'm trying to do.
The negation of the "do not match }" doesn't seem to be doing anything. Maybe I just need an OR in there or something.
Well, thanks if you have time to answer.
The $string could contain multiple instances of the {} so a greedy regex won't work....that I know of.
You can't make a regex count the opening brackets and the corresponding closeing brackets, you should use a simple for loop to do that, but you can get the complete string from the first opening bracket to the last closeing one with a greedy expression like: ({.*}). Note that simple string functions are much faster then regular expressions, so you should use those instead.

regex to find text inside first occurrence of tokens "[]"

I'm grabbing a file via file_get_contents($text_file) that has a token in the beginning of the contents in the form...
[widget_my_widget]
The full contents of the file might be...
[widget_my_widget]
Here is the start of the text file. It may have [] brackets inside it,
but only the first occurrence denotes the token.
I'm looking for the regex to grab the text string inside the first occurrence of the brackets []. In this case, I'd want to return widget_my_widget and load it into my variable.
Thanks in advance for your help.
The first captured group in \[(.+?)\] will match the string inside the square brackets.
In PHP you can use it like this:
if (preg_match('/\[(.+?)\]/', file_get_contents($text_file), $group)) {
print $group[1];
}
At first occurance in this string (the file content), ignore the left square bracket, then match as little as possible, but up to (not including) the right square bracket.
I think Staffan's answer is mostly correct, but I have a minor correction and I think you may want to avoid the ^, or the string will have to start with a bracket and you said you just want the first instance. On to the code...
$str = ...
$pattern = '/\[([^\]+])\]/';
$n = preg_match($pattern, $str, $matches);
if ($n > 0) {
$first_match = $matches[1];
/// do stuff
}
My pattern looks a little confusing because brackets have special meaning, so I'll try to explain it... we're looking for a open bracket, then one or more characters that is not a closing bracket (in this context, the caret means "not"), then a closing bracket. The parenthesis are around the content we want to capture and the inner brackets are a character class. If that makes no sense, just ignore everything I just said.

Is iteration necessary in the following piece of code?

Here's a piece of code from the xss_clean method of the Input_Core class of the Kohana framework:
do
{
// Remove really unwanted tags
$old_data = $data;
$data = preg_replace('#</*(?:applet|b(?:ase|gsound|link)|embed|frame(?:set)?|i(?:frame|layer)|l(?:ayer|ink)|meta|object|s(?:cript|tyle)|title|xml)[^>]*+>#i', '', $data);
}
while ($old_data !== $data);
Is the do ... while loop necessary? I would think that the preg_replace call would do all the work in just one iteration.
Well, it's necessary if the replacement potentially creates new matches in the next iteration. It's not very wasteful because it's only and additional check at worst, though.
Going by the code it matches, it seems unlikely that it will create new matches by replacement, however: it's very strict about what it matches.
EDIT: To be more specific, it tries to match an opening angle bracket optionally followed by a slash followed by one of several keywords optionally followed by any number of symbols that are not a closing angle bracket and finally a closing angle bracket. If the input follows that syntax, it'll be swallowed whole. If it's malformed (e.g. multiple opening and closing angle brackets), it'll generate garbage until it can't find substrings matching the initial sequence anymore.
So, no. Unless you have code like <<iframe>iframe>, no repetition is necessary. But then you're dealing with a level of tag soup the regex isn't good enough for anyway (e.g. it will fail on < iframe> with the extra space).
EDIT2: It's also a bit odd that the pattern matches zero or more slashes at the beginning of the tag (it should be zero or one). And if my regex knowledge isn't too rusty, the final *+ doesn't make much sense either (the asterisk means zero or more, the plus means one or more, maybe it's a greedy syntax or something fancy like that?).
On a completely unrelated subject, I would like to add a word on optimisation here.
preg_replace() can tell you whether a replacement has been made or not (see the 5th argument, which is passed by reference). It's far much efficient than comparing strings, especially if they are large.

Categories