Matches text inside brackets with Regex in PHP - php

I have some text like:
name: [my_name]
email: [my_email]
I'd like to grab the fields in square brackets with regex—how would I do that?
I've tried using this pattern: [*.?]
Unfortunately it doesn't work. PHP gives this error:
compilation failed: nothing to repeat at offset 0
What's wrong? Is the pattern correct?

The brackets are special characters in regex. To match them you'll have to escape them with a back-slash. Something like \[(.*?)\]. Adding the parens () captures whatever is matched inside it so you you can use it later. Otherwise you're just matching on the whole pattern and you'd have to manually strip the brackets.

You should move the * and escape the [ and ]. So make it \[.*\] Since . matches any character already and * says: 0 or more of that char. So .* is 0 or more of any char

No, you got the order wrong. It should be something like
\[(.*)\]
.* = Something repeated as many times as possible.
The compilation error you get is because the compiler does not now what to repeat, as [ is a special character in regular expressions. The ? you added would also allow nothing within the brackets, which I figured you don't want, so I removed it. The question mark makes the foregoing statement optional. The parentheses aroudn the .* are used to capture the result. If you don't add those, the regex will match, but you won't get whats inside the brackets as result.

<?php
$text =
"name: [my_name]
email: [my_email]";
$pattern = '/\[(.*)\]/';
$matches = array();
preg_match_all($pattern, $text, $matches);
$name = $matches[1][0];
$email = $matches[1][1];
print "$name<br />";
print "$email";
?>
will output
my_name
my_email
/ is the delimiter (not part of the actual pattern per se). The \ is for escaping the [ and ] brackets, as they define character class definitions in patterns when not escaped. ( and ) define subpatterns, which means that text captured by a subpattern will be put into the array referenced by the third parameter of preg_match_all (in this case $matches).

Escape it!
[ and ] special characters, so you need to escape them:
\[*.?\]

Related

preg_replace PHP not working?

Why doesn't preg_replace return anything in this scenario? I've been trying to figure it out all night.
Here is the text contained within $postContent:
Test this. Here is a quote: [Quote]1[/Quote] Quote is now over.
Here is my code:
echo "Test I'm Here!!!";
$startQuotePos = strpos($postContent,'[Quote]')+7;
$endQuotePos = strpos($postContent,'[/Quote]');
$postStrLength = strlen($postContent);
$quotePostID = substr($postContent,$startQuotePos,($endQuotePos-$postStrLength));
$quotePattern = '[Quote]'.$quotePostID.'[/Quote]';
$newPCAQ = preg_replace($quotePattern,$quotePostID,$postContent);
echo "<br />$startQuotePos<br />$endQuotePos<br />$quotePostID<br />Qpattern:$quotePattern<br />PCAQ: $newPCAQ<br />";
This is my results:
Test I'm Here!!!
35
36
1
Qpattern:[Quote]1[/Quote]
PCAQ:
For preg_replace(), "[Quote]" matches a single character that is one of the following: q, u, o, t, or e.
If you want that preg_replace() finds the literal "[Quote]", you need to escape it as "\[Quote\]". preg_quote() is the function you should use: preg_quote("[Quote]").
Your code is also wrong because a regular expression is expected to start with a delimiter. In the preg_replace() call I am showing at the end of my answer, that is #, but you could use another character, as long as it doesn't appear in the regular expression, and it is used also at the end of the regular expression. (In my case, # is followed by a pattern modifier, and pattern modifiers are the only characters allowed after the pattern delimiter.)
If you are going to use preg_replace(), it doesn't make sense that you first find where "[Quote]" is. I would rather use the following code:
$newPCAQ = preg_replace('#\[Quote\](.+?)\[/Quote\]#i', '\1', $postContent);
I will explain the regular expression I am using:
The final '#i' is saying to preg_replace() to ignore the difference between lowercase, and uppercase characters; the string could contain "[QuOte]234[/QuOTE]", and that substring would match the regular expression the same.
I use a question mark in "(.+?)" to avoid ".+" is too greedy, and matches too much characters. without it, the regular expression could include in a single match a substring like "[Quote]234[/Quote] Other text [Quote]475[/Quote]" while this should be matched as two substrings: "[Quote]234[/Quote]", and "[Quote]475[/Quote]".
The '\1' string I am using as replacement string is saying to preg_replace() to use the string matched from the sub-group "(.+?)" as replacement. In other words, the call to preg_replace() is removing "[Quote]", and "[/Quote]" surrounding other text. (It doesn't replace "[/Quote]" that doesn't match with "[Quote]", such as in "[/Quote] Other text [Quote]".)
your regex must start & end with '/':
$quotePattern = '/[Quote]'.$quotePostID.'[/Quote]/';
The reason you don't see anything for the return value of preg_replace is because it has returned NULL (see the manual link for details). This is what preg_replace returns when an error occurs, which is what happened in your situation. The string value of NULL is a zero-length string. You can see this by using var_dump instead, which will tell you that preg_replace returned NULL.
Your regular expression is invalid and as such PHP will throw an E_WARNING level error of Warning: preg_replace(): Unknown modifier '['
There are a couple of reason for this. First, you need to specify an opening and closing delimiter for you regular expression as preg_* functions use PCRE style regular expression. Second, you want to also consider using preg_quote on your patter (sans the delimiter) to ensure it is escaped properly.
$postContent = "Test this. Here is a quote: [Quote]1[/Quote] Quote is now over.";
/* Specify a delimiter for your regular expression */
$delimiter = '#';
$startQuotePos = strpos($postContent,'[Quote]')+7;
$endQuotePos = strpos($postContent,'[/Quote]');
$postStrLength = strlen($postContent);
$quotePostID = substr($postContent,$startQuotePos,($endQuotePos-$postStrLength));
/* Make sure you use the delimiter in your pattern and escape it properly */
$quotePattern = $delimiter . preg_quote("[Quote]{$quotePostID}[/Quote]", $delimiter) . $delimiter;
$newPCAQ = preg_replace($quotePattern,$quotePostID,$postContent);
echo "<br />$startQuotePos<br />$endQuotePos<br />$quotePostID<br />Qpattern:$quotePattern<br />PCAQ: $newPCAQ<br />";
The output will be:
35
36
1
Qpattern:#[Quote]1[/Quote]#
PCAQ: Test this. Here is a quote: 1 Quote is now over.

PHP regex for #[mention]

Can someone help me:
$pattern = "/^(?:[a-zA-Z0-9?. ]?)+#([a-zA-Z0-9]+)(.+)?$/";
$str = "Hey #[14256] hey how are you?";
preg_match($pattern, $title, $matches);
print_r($matches);
The print result works fine if I remove the brackets (#[14256]) of the # mention, however I can't figure out how to do the regex to work with the brackets. So I will get the result 14256 in my array.
You need to include the brackets in your regex:
"/^(?:[a-zA-Z0-9?. ]?)+#(\\[?[a-zA-Z0-9]+\\]?)(.+)?$/"
Notice the \\[? and \\]? I've added; those will match the [] characters, and will also match if there is no [].
Keep in mind, the above will match #[14256 and #14256]. If you want to only match one or the other, you need to do it a little differently.
"/^(?:[a-zA-Z0-9?. ]?)+#([a-zA-Z0-9]+|\\[[a-zA-Z0-9]+\\])(.+)?$/"
This will match EITHER #aA1 or #[aA1], but not the bad examples as I showed above.
One last thing to include: This regex will only match one instance of the #[mention]. If you want to match ALL instances of it (such as in "hey #123, how is #456 these days?"), use the following with preg_match_all():
"/#([a-zA-Z0-9]+|\\[[a-zA-Z0-9]+\\])/"
Then $matches[1] will contain both 123 and 456.
You need to escape the brackets in your regex so they don't get interpreted as a new character class. Try this instead (it will only capture the number, not the brackets. Place the escaped brackets in the parentheses to capture them as part of a backreference):
$pattern = "/^(?:[a-zA-Z0-9?. ]?)+#\[([a-zA-Z0-9]+)\](.+)?$/";

regex to find all text after delimited string

I have some content that contains a token string in the form
$string_text = '[widget_abc]This is some text. This is some text, etc...';
And I want to pull all the text after the first ']' character
So the returned value I'm looking for in this example is:
This is some text. This is some text, etc...
preg_match("/^.+?\](.+)$/is" , $string_text, $match);
echo trim($match[1]);
Edit
As per author's request - added explanation:
preg_match(param1, param2, param3) is a function that allows you to match a single case scenario of a regular expression that you're looking for
param1 = "/^.+?](.+?)$/is"
"//" is what you put on the outside of your regular expression in param1
the i at the end represents case insensitive (it doesn't care if your letters are 'a' or 'A')
s - allows your script to go over multiple lines
^ - start the check from the beginning of the string
$ - go all the way to end of the string
. - represents any character
.+ - at least one or more characters of anything
.+? - at least one more more characters of anything until you reach
.+?] - at least one or more characters of anything until you reach ] (there is a backslash before ] because it represents something in regular expressions - look it up)
(.+)$ - capture everything after ] and store it as a seperate element in the array defined in param3
param2 = the string that you created.
I tried to simplify the explanations, I might be off, but I think I'm right for the most part.
The regex (?<=]).* will solve this problem if you can guarantee that there are no other square brackets on the line. In PHP the code will be:
if (preg_match('/(?<=\]).*/', $input, $group)) {
$match = $group[0];
}
This will transform [widget_abc]This is some text. This is some text, etc... into This is some text. This is some text, etc.... It matches everything that follows the ].
$output = preg_replace('/^[^\]]*\]/', '', $string_text);
Is there any particular reason why a regex is wanted here?
echo substr(strstr($string_text, ']'), 1);
A regex is definitely overkill for this instance.
Here is a nice one-liner :
list(, $result) = explode(']', $inputText, 2);
It does the job and is way less expensive than using regular expressions.

Get Everything between two characters

I'm using PHP. I'm trying to get a Regex pattern to match everything between value=" and " i.e. Line 1 Line 2,...,to Line 4.
value="Line 1
Line 2
Line 3
Line 4"
I've tried /.*?/ but it doesn't seem to work.
I'd appreciate some help.
Thanks.
P.S. I'd just like to add, in response to some comments, that all strings between the first " and last " are acceptable. I'm just trying to find a way to get everything between the very first " and very last " even when there is a " in between. I hope this makes sense. Thanks.
Assuming the desired character is "double quote":
$pat = '/\"([^\"]*?)\"/'; // text between quotes excluding quotes
$value='"Line 1 Line 2 Line 3 Line 4"';
preg_match($pat, $value, $matches);
echo $matches[1]; // $matches[0] is string with the outer quotes
if you just want answer and not want specific regex,then you can use this:
<?php
$str='value="Line 1
Line 2
Line 3
Line 4"';
$need=explode("\"",$str);
var_dump($need[1]);
?>
/.*?/ has the effect to not match the new line characters. If you want to match them too, you need to use a regular expression like /([^"]*)/.
I agree with Josh K that a regular expression is not required in this case (especially if you know there will not be any apices apart the one to delimit the string). You could adopt the solution given by him as well.
If you must use regex:
if (preg_match('!"([^"]+)"!', $value, $m))
echo $m[1];
You need s pattern modifier. Something like: /value="(.*)"/s
I'm not a regex guru, but why not just explode it?
// Say $var contains this value="..." string
$arr = explode('value="');
$mid = explode('"', $arr[1]);
$fin = $mid[0]; // Contains what you're looking for.
The specification isn't clear, but you can try something like this:
/value="[^"]*"/
Explanation:
First, value=" is matched literally
Then, match [^"]*, i.e. anything but ", possibly spanning multiple lines
Lastly, match " literally
This does not allow " to appear between the "real" quotes, not even if it's escaped by e.g. preceding with a backslash.
The […] is a character class. Something like [aeiou] matches one of any of the lowercase vowels. [^…] is a negated character class. [^aeiou] matches one of anything but the lowercase vowels.
References
regular-expressions.info/Examples - Programming Language Constructs - Strings
Has variations on different string patterns (e.g. allowing escaped quotes)
Related questions
Difference between .*? and .* for regex
As much as is practical, negated character class is always a better option than .*?

preg_replace Pattern

i'm not very firm with preg_replace - in other Words i do not really understand - so i hope you can help me.
I have a string in a Text like this one: [demo category=1] and want to replace with the Content of Category (id=1) e.g. "This is the Content of my first Category"
This is my startpoint Pattern - that's all i have:
'/[demo\s*.*?]/i';
Hope you can help?
Firstly, you need to escape the square brackets as they are special characters in PCREs:
'/\[demo\s*.*?\]/i';
Secondly, it sounds like you want to do something with the digit at the end, so you'll want to capture it using parenthesis:
'/\[demo\s*.*?=(\d+)\]/i';
The braces will capture \d+ and store it in a reference. \d+ will match a string of numbers only.
Finally, it sounds like you need to use preg_replace_callback to perform a special function on the matches in order to get the string you want:
function replaceMyStr($matches)
{
$strNum = array("1"=>"first", "2"=>"second", "3"=>"third"); // ...etc
return "This is the Content of my ".$strNum($matches[1])." Category.";
// $matches[1] will contain the captured number
}
preg_replace_callback('/\[demo\s*.*?=(\d+)\]/i', "replaceMyStr", "[demo category=1]");
further to the above answers, you have 2 ways to do the actual replacing. assuming you have 10 category names you want to replace, you can either do something like
for ($i = 1; $i <= $max_category; $i++) {
$category_name = get_category_name($i);
$s = preg_replace("/\[demo\s+category=(\d+)\]/i", $category_name, $s);
}
or
$s = preg_replace_callback("/\[demo\s+category=(\d+)\]/i", "get_category_name", $s);
in both cases, get_category_name($id) is a function that will get a category name for an id. you should test both options to evaluate which is faster for your uses.
The pattern is going to be like this
/\[demo\s+category=(\d+)\]/i'
(you need to escape brackets because they're special)
The [ and ] characters have special meaning (they denote character classes - ranges and collections of character). You need to escape [ as \[ (and evidently in PHP, unlike other regex flavors, you also need to escape ]). Also I suggest you make use of the character class [^]] = match any character that is not a ]
/\[demo\s+[^]]*\]/i
should work better.
Edit: If you want to extract the name and number, then you can use
/\[demo\s+(\w+)\s*=\s*(\d+)\]/i

Categories