PHP Regex extract numbers inside brackets

PHP Regex extract numbers inside brackets - php

I'm currently building a chat system with reply function.
How can I match the numbers inside the '#' symbol and brackets, example: #[123456789]
This one works in JavaScript
/#\[(0-9_)+\]/g
But it doesn't work in PHP as it cannot recognize the /g modifier. So I tried this:
/\#\[[^0-9]\]/
I have the following example code:
$example_message = 'Hi #[123456789] :)';
$msg = preg_replace('/\#\[[^0-9]\]/', '$1', $example_message);
But it doesn't work, it won't capture those numbers inside #[ ]. Any suggestions? Thanks

You have some core problems in your regex, the main one being the ^ that negates your character class. So instead of [^0-9] matching any digit, it matches anything but a digit. Also, the g modifier doesn't exist in PHP (preg_replace() replaces globally and you can use preg_match_all() to match expressions globally).
You'll want to use a regex like /#\[(\d+)\]/ to match (with a group) all of the digits between #[ and ].
To do this globally on a string in PHP, use preg_match_all():
preg_match_all('/#\[(\d+)\]/', 'Hi #[123456789] :)', $matches);
var_dump($matches);
However, your code would be cleaner if you didn't rely on a match group (\d+). Instead you can use "lookarounds" like: (?<=#\[)\d+(?=\]). Also, if you will only have one digit per string, you should use preg_match() not preg_match_all().
Note: I left the example vague and linked to lots of documentation so you can read/learn better. If you have any questions, please ask. Also, if you want a better explanation on the regular expressions used (specifically the second one with lookarounds), let me know and I'll gladly elaborate.

Use the preg_match_all function in PHP if you’d like to produce the behaviour of the g modifier in Javascript. Use the preg_match function otherwise.
preg_match_all("/#\\[([0-9]+)\\]/", $example_message, $matches);
Explanation:
/ opening delimiter
# match the at sign
\\[ match the opening square bracket (metacharacter, so needs to be escaped)
( start capturing
[0-9] match a digit
+ match the previous once or more
) stop capturing
\\] match the closing square bracket (metacharacter, so needs to be escaped)
/ closing delimiter
Now $matches[1] contains all the numbers inside the square brackets.

Related

php preg_match() not working, find a var

header.php file
<?php
echo 'this is example '.$adv['name'].' , this is another.....';
main.php file
<?php
if(preg_match('/$adv[(.*?)]/',file_get_contents('header.php'),$itext)){
echo $itext[1].'';
}
show empty

this regular expression will work
/\$adv\[(.*?)\]/
You need to escape symbols $,[,] using \ before them
since they have special meaning in regular expressions

Here is a more efficient solution:
Pattern (Demo):
/\$adv\[\K[^]]*?(?=\])/
PHP Implementation:
if(preg_match('/\$adv\[\K[^]]*?(?=\])/','this is example $adv[name]',$itext)){
echo $itext[0];
}
Output for $itext:
name
Notice that by using \K to replace the capture group, the targeted string is returned in the "full string" match which effectively reduces the output array by 50%.
You can see the demo link for explanation on individual pieces of the pattern, but basically it matches $adv[ then resets the matching point, matches all characters between the square brackets, then does a positive lookahead for a closing square bracket which will not be included in the returned match.
Regardless of if you want to match different variable names you could use: /\$[^[]*?\[\K[^]]*?(?=\])/. This will accommodate adv or any other substring that follows the dollar sign. By using Negated Character Classes like [^]] instead of . to match unlimited characters, the regex pattern performs more efficiently.
If adv is not a determining component of your input strings, I would use /\$[^[]*?\[\K[^]]*?(?=\])/ because it will be the most efficient.

how can i using Regex for find string covered in [ ]? [duplicate]

Simple regex question. I have a string on the following format:
this is a [sample] string with [some] special words. [another one]
What is the regular expression to extract the words within the square brackets, ie.
sample
some
another one
Note: In my use case, brackets cannot be nested.

You can use the following regex globally:
\[(.*?)\]
Explanation:
\[ : [ is a meta char and needs to be escaped if you want to match it literally.
(.*?) : match everything in a non-greedy way and capture it.
\] : ] is a meta char and needs to be escaped if you want to match it literally.

(?<=\[).+?(?=\])
Will capture content without brackets
(?<=\[) - positive lookbehind for [
.*? - non greedy match for the content
(?=\]) - positive lookahead for ]
EDIT: for nested brackets the below regex should work:
(\[(?:\[??[^\[]*?\]))

This should work out ok:
\[([^]]+)\]

Can brackets be nested?
If not: \[([^]]+)\] matches one item, including square brackets. Backreference \1 will contain the item to be match. If your regex flavor supports lookaround, use
(?<=\[)[^]]+(?=\])
This will only match the item inside brackets.

To match a substring between the first [ and last ], you may use
\[.*\] # Including open/close brackets
\[(.*)\] # Excluding open/close brackets (using a capturing group)
(?<=\[).*(?=\]) # Excluding open/close brackets (using lookarounds)
See a regex demo and a regex demo #2.
Use the following expressions to match strings between the closest square brackets:
Including the brackets:
\[[^][]*] - PCRE, Python re/regex, .NET, Golang, POSIX (grep, sed, bash)
\[[^\][]*] - ECMAScript (JavaScript, C++ std::regex, VBA RegExp)
\[[^\]\[]*] - Java, ICU regex
\[[^\]\[]*\] - Onigmo (Ruby, requires escaping of brackets everywhere)
Excluding the brackets:
(?<=\[)[^][]*(?=]) - PCRE, Python re/regex, .NET (C#, etc.), JGSoft Software
\[([^][]*)] - Bash, Golang - capture the contents between the square brackets with a pair of unescaped parentheses, also see below
\[([^\][]*)] - JavaScript, C++ std::regex, VBA RegExp
(?<=\[)[^\]\[]*(?=]) - Java regex, ICU (R stringr)
(?<=\[)[^\]\[]*(?=\]) - Onigmo (Ruby, requires escaping of brackets everywhere)
NOTE: * matches 0 or more characters, use + to match 1 or more to avoid empty string matches in the resulting list/array.
Whenever both lookaround support is available, the above solutions rely on them to exclude the leading/trailing open/close bracket. Otherwise, rely on capturing groups (links to most common solutions in some languages have been provided).
If you need to match nested parentheses, you may see the solutions in the Regular expression to match balanced parentheses thread and replace the round brackets with the square ones to get the necessary functionality. You should use capturing groups to access the contents with open/close bracket excluded:
\[((?:[^][]++|(?R))*)] - PHP PCRE
\[((?>[^][]+|(?<o>)\[|(?<-o>]))*)] - .NET demo
\[(?:[^\]\[]++|(\g<0>))*\] - Onigmo (Ruby) demo

If you do not want to include the brackets in the match, here's the regex: (?<=\[).*?(?=\])
Let's break it down
The . matches any character except for line terminators. The ?= is a positive lookahead. A positive lookahead finds a string when a certain string comes after it. The ?<= is a positive lookbehind. A positive lookbehind finds a string when a certain string precedes it. To quote this,
Look ahead positive (?=)
Find expression A where expression B follows:
A(?=B)
Look behind positive (?<=)
Find expression A where expression B
precedes:
(?<=B)A
The Alternative
If your regex engine does not support lookaheads and lookbehinds, then you can use the regex \[(.*?)\] to capture the innards of the brackets in a group and then you can manipulate the group as necessary.
How does this regex work?
The parentheses capture the characters in a group. The .*? gets all of the characters between the brackets (except for line terminators, unless you have the s flag enabled) in a way that is not greedy.

Just in case, you might have had unbalanced brackets, you can likely design some expression with recursion similar to,
\[(([^\]\[]+)|(?R))*+\]
which of course, it would relate to the language or RegEx engine that you might be using.
RegEx Demo 1
Other than that,
\[([^\]\[\r\n]*)\]
RegEx Demo 2
or,
(?<=\[)[^\]\[\r\n]*(?=\])
RegEx Demo 3
are good options to explore.
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:
Test
const regex = /\[([^\]\[\r\n]*)\]/gm;
const str = `This is a [sample] string with [some] special words. [another one]
This is a [sample string with [some special words. [another one
This is a [sample[sample]] string with [[some][some]] special words. [[another one]]`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Source
Regular expression to match balanced parentheses

(?<=\[).*?(?=\]) works good as per explanation given above. Here's a Python example:
import re
str = "Pagination.go('formPagination_bottom',2,'Page',true,'1',null,'2013')"
re.search('(?<=\[).*?(?=\])', str).group()
"'formPagination_bottom',2,'Page',true,'1',null,'2013'"

The #Tim Pietzcker's answer here
(?<=\[)[^]]+(?=\])
is almost the one I've been looking for. But there is one issue that some legacy browsers can fail on positive lookbehind.
So I had to made my day by myself :). I manged to write this:
/([^[]+(?=]))/g
Maybe it will help someone.
console.log("this is a [sample] string with [some] special words. [another one]".match(/([^[]+(?=]))/g));

if you want fillter only small alphabet letter between square bracket a-z
(\[[a-z]*\])
if you want small and caps letter a-zA-Z
(\[[a-zA-Z]*\])
if you want small caps and number letter a-zA-Z0-9
(\[[a-zA-Z0-9]*\])
if you want everything between square bracket
if you want text , number and symbols
(\[.*\])

This code will extract the content between square brackets and parentheses
(?:(?<=\().+?(?=\))|(?<=\[).+?(?=\]))
(?: non capturing group
(?<=\().+?(?=\)) positive lookbehind and lookahead to extract the text between parentheses
| or
(?<=\[).+?(?=\]) positive lookbehind and lookahead to extract the text between square brackets

In R, try:
x <- 'foo[bar]baz'
str_replace(x, ".*?\\[(.*?)\\].*", "\\1")
[1] "bar"

([[][a-z \s]+[]])
Above should work given the following explaination
characters within square brackets[] defines characte class which means pattern should match atleast one charcater mentioned within square brackets
\s specifies a space
 + means atleast one of the character mentioned previously to +.

I needed including newlines and including the brackets
\[[\s\S]+\]

If someone wants to match and select a string containing one or more dots inside square brackets like "[fu.bar]" use the following:
(?<=\[)(\w+\.\w+.*?)(?=\])
Regex Tester

(PHP) How to find words beginning with a pattern and replace all of them?

I have a string. An example might be "Contact /u/someone on reddit, or visit /r/subreddit or /r/subreddit2"
I want to replace any instance of "/r/x" and "/u/x" with "[/r/x](http://reddit.com/r/x)" and "[/u/x](http://reddit.com/u/x)" basically.
So I'm not sure how to 1) find "/r/" and then expand that to the rest of the word (until there's a space), then 2) take that full "/r/x" and replace with my pattern, and most importantly 3) do this for all "/r/" and "/u/" matches in a single go...
The only way I know to do this would be to write a function to walk the string, character by character, until I found "/", then look for "r" and "/" to follow; then keep going until I found a space. That would give me the beginning and ending characters, so I could do a string replacement; then calculate the new end point, and continue walking the string.
This feels... dumb. I have a feeling there's a relatively simple way to do this, and I just don't know how to google to get all the relevant parts.

A simple preg_replace will do what you want.
Try:
$string = preg_replace('#(/(?:u|r)/[a-zA-Z0-9_-]+)#', '[\1](http://reddit.com\1)', $string);
Here is an example: http://ideone.com/dvz2zB
You should see if you can discover what characters are valid in a Reddit name or in a Reddit username and modify the [a-zA-Z0-9_-] charset accordingly.

You are looking for a regular expression.
A basic pattern starts out as a fixed string. /u/ or /r/ which would match those exactly. This can be simplified to match one or another with /(?:u|r)/ which would match the same as those two patterns. Next you would want to match everything from that point up to a space. You would use a negative character group [^ ] which will match any character that is not a space, and apply a modifier, *, to match as many characters as possible that match that group. /(?:u|r)/[^ ]*
You can take that pattern further and add a lookbehind, (?<= ) to ensure your match is preceded by a space so you're not matching a partial which results in (?<= )/(?:u|r)/[^ ]*. You wrap all of that to make a capturing group ((?<= )/(?:u|r)/[^ ]*). This will capture the contents within the parenthesis to allow for a replacement pattern. You can express your chosen replacement using the \1 reference to the first captured group as [\1](http://reddit.com\1).
In php you would pass the matching pattern, replacement pattern, and subject string to the preg_replace function.

In my opinion regex would be an overkill for such a simple operation. If you just want to replace instance of "/r/x" with "[r/x](http://reddit.com/r/x)" and "/u/x" with "[/u/x](http://reddit.com/u/x)" you should use str_replace although with preg_replace it'll lessen the code.
str_replace("/r/x","[/r/x](http://reddit.com/r/x)","whatever_string");
use regex for intricate search string and replace. you can also use http://www.jslab.dk/tools.regex.php regular expression generator if you have something complex to capture in the string.

Regular expression to find pattern in string in PHP

Suppose I have a string that looks like:
"lets refer to [[merp] [that entry called merp]] and maybe also to that entry called [[blue] [blue]]"
The idea here is to replace a block of [[name][some text]] with some text.
So I'm trying to use regular expressions to find blocks that look like [[name][some text]], but I'm having tremendous difficulty.
Here's what I thought should work (in PHP):
preg_match_all('/\[\[.*\]\[.*\]/', $my_big_string, $matches)
But this just returns a single match, the string from '[[merp' to 'blue]]'. How can I get it to return the two matches [[merp][that entry called merp]] and [[blue][blue]]?

The regex you're looking for is \[\[(.+?)\]\s\[(.+?)\]\] and replace it with $2
The regex pattern matched inside the () braces are captured and can be back-referenced using $1, $2,...
Example on regex101.com

Quantifiers like the * are by default greedy,
which means, that as much as possible is matched to meet conditions. E.g. in your sample a regex like \[.*\] would match everything from the first [ to the last ] in the string. To change the default behaviour and make quantifiers lazy (ungreedy, reluctant):
Use the U (PCRE_UNGREEDY) modifier to make all quantifiers lazy
Put a ? after a specific quantifier. E.g. .*? as few of any characters as possible
1.) Using the U-modifier a pattern could look like:
/\[\[(.*)]\s*\[(.*)]]/Us
Additional used the s (PCRE_DOTALL) modifier to make the . dot also match newlines. And added some \s whitespaces in between ][ which are in your sample string. \s is a shorthand for [ \t\r\n\f].
There are two capturing groups (.*) to be replaced then. Test on regex101.com
2.) Instead using the ? to making each quantifier lazy:
/\[\[(.*?)]\s*\[(.*?)]]/s
Test on regex101.com
3.) Alternative without modifiers, if no square brackets are expected to be inside [...].
/\[\[([^]]*)]\s*\[([^]]*)]]/
Using a ^ negated character class to allow [^]]* any amount of characters, that are NOT ] in between [ and ]. This wouldn't require to rely on greediness. Also no . is used, so no s-modifier is needed.
Test on regex101.com
Replacement for all 3 examples according to your sample: \2 where \1 correspond matches of the first parenthesized group,...

PHP regex lookbehind with wildcard

I have two strings in PHP:
$string = '<a href="http://localhost/image1.jpeg" /></a>';
and
$string2 = '[caption id="attachment_5" align="alignnone" width="483"]<a href="http://localhost/image1.jpeg" /></a>[/caption]';
I'm trying to match strings of the first type. That is strings that are not surrounded by '[caption ... ]' and '[/caption]'. So far, I would like to use something like this:
$pattern = '/(?<!\[caption.*\])(?!\[\/caption\])(<a.*><img.*><\/a>)/';
but PHP matches out the first string as well with this pattern even though it is NOT preceeded by '[caption' and zero or more characters followed by ']'. What gives? Why is this and what's the correct pattern?
Thanks.

Variable length look-behind is not supported in PHP, so this part of your pattern is not valid:
(?<!\[caption.*\])
It should be warning you about this.
In addition, .* always matches the larges possible amount. Thus your pattern may result in a match that overlaps multiple tags. Instead, use [^>] (match anything that is not a closing bracket), because closing brackets should not occur inside the img tag.
To solve the look-behind problem, why not just check for the closing tag only? This should be sufficient (assuming the caption tags are only used in a way similar to what you have shown).
$pattern = '|(<a[^>]*><img[^>]*></a>)(?!\[/caption\])|';
When matching patterns that contain /, use another character as the pattern delimiter to avoid leaning toothpick syndrome. You can use nearly any non-alphanumeric character around the pattern.
Update: the previous regex is based on the example regex you gave, rather than the example data. If you want to match links that don't contain images, do this:
$pattern = '|(<a[^>]*>[^<]*</a>)(?!\[/caption\])|';
Note that this doesn't allow any tags in the middle of the link. If you allow tags (such as by using .*?), a regex could match something starting within the [caption] and ending elsewhere.

I don't see how your regexp could match either string, since you're looking for <a.*><img.*><\/a>, and both anchors don't contain an <img... tag. Also, the two subexpressions looking for and prohibiting the caption-bits look oddly positioned to me. Finally, you need to ensure your tag-matching bits don't act greedy, i.e. don't use .* but [^>]*.
Do you mean something like this?
$pattern = '/(<a[^>]*>(<img[^>]*>)?<\/a>)(?!\[\/caption\])/'
Test it on regex101.
Edit: Removed useless lookahead as per dan1111's suggestion and updated regex101 link.

Lookbehind doesn't allow non fixed length pattern i.e. (*,+,?), I think this /<a.*><\/a>(?!\[\/caption\])/ is enough for your requirement

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP Regex extract numbers inside brackets - php

Related

php preg_match() not working, find a var

how can i using Regex for find string covered in [ ]? [duplicate]

(PHP) How to find words beginning with a pattern and replace all of them?

Regular expression to find pattern in string in PHP

PHP regex lookbehind with wildcard

Categories

Resources