Regex problem (PHP) - php

[quote=Username here]quoted text here[/quote]
Reply text here
I need a regular expression that stores the "Username here", "quoted text here" and "Reply text here" in a Array.
This expression needs to support nesting aswell. Eks:
[quote=Username2 here][quote=Username here]quoted text here[/quote]
Reply text here[/quote]
Reply text here

This regex matches nested quote block (in group 1) with an additional last reply (in group 2):
(\[quote=[^]]*](?:(?R)|.)*\[/quote])(.*)
A little demo:
$text = '[quote=Username2 here][quote=Username here]quoted text[/quote]Reply text[/quote]More text';
preg_match('#(\[quote=[^]]*](?:(?R)|.)*\[/quote])(.*)#is', $text, $match);
print_r($match);
produces:
Array
(
[0] => [quote=Username2 here][quote=Username here]quoted text[/quote]Reply text[/quote]More text
[1] => [quote=Username2 here][quote=Username here]quoted text[/quote]Reply text[/quote]
[2] => More text
)
A little explanation:
( # open group 1
\[quote=[^]]*] # match '[quote= ... ]'
(?:(?R)|.)* # recursively match the entire pattern or any character and repeat it zero or more times
\[/quote] # match '[/quote]'
) # open group 1
( # open group 2
.* # match zero or more trailing chars after thae last '[/quote]'
) # close group 2
But, using these recursive regex constructs supported by PHP might make ones head spin... I'd opt for a little parser like John Kugelman suggested.

Assuming you do not want to return the values nested in some way or with quotes matched - which are impossible in a regex - you can just split on the parts you do not need:
preg_split('/(\[quote=|\[quote]|]|\[/quote])/', $yourstring);

Related

get all text between bracket but skip nested bracket

Im trying to figure out how to get the text between two bracket tags but dont stop at the first closing )
__('This is a (TEST) all of this i want') i dont want any of this;
my current pattern is __\((.*?)\)
which gives me
__('This is a (TEST)
but i want
__('This is a (TEST) all of this i want')
Thanks
You may use a regex subroutine to match text inside nested parentheses after __:
if (preg_match_all('~__(\(((?:[^()]++|(?1))*)\))~', $s, $matches)) {
print_r($matches[2]);
}
See the regex demo.
Details
__ - a __ substring
(\(((?:[^()]++|(?1))*)\)) - Group 1 (it will be recursed using the (?1) subroutine):
\( - a ( char
((?:[^()]++|(?1))*) - Group 2 capturing 0 or more repetitions of any 1+ chars other than ( and ) or the whole Group 1 pattern is recursed
\) - a ) char.
See the PHP demo:
$s = "__('This is a (TEST) all of this i want') i dont want any of this; __(extract this)";
if (preg_match_all('~__(\(((?:[^()]++|(?1))*)\))~', $s, $matches)) {
print_r($matches[2]);
}
// => Array ( [0] => 'This is a (TEST) all of this i want' [1] => extract this )
You forgot to escape two parenthesis in your regex : __\((.*)\);
Check on regex101.com.
Use the pattern __\((.*)?\).
The \ escapes the parentheses to catch literal parentheses. This then captures all the text inside that set of parentheses.

Match regex pattern that isn't within a bbcode tag

I am attempting to create a regex patten that will match words in a string that begin with #
Regex that solves this initial problem is '~(#\w+)~'
A second requirement of the code is that it must also ignore any matches that occur within [quote] and [/quote] tags
A couple of attempts that have failed are:
(?:[0-9]+|~(#\w+)~)(?![0-9a-z]*\[\/[a-z]+\])
/[quote[\s\]][\s\S]*?\/quote](*SKIP)(*F)|~(#\w+)~/i
Example: the following string should have an array output as displayed:
$results = [];
$string = "#friends #john [quote]#and #jane[/quote] #doe";
//run regex match
preg_match_all('regex', $string, $results);
//dump results
var_dump($results[1]);
//results: array consisting of:
[1]=>"#friends"
[2]=>"#john"
[3]=>"#doe
You may use the following regex (based on another related question):
'~(\[quote](?:(?1)|.)*?\[/quote])(*SKIP)(*F)|#\w+~s'
See the regex demo. The regex accounts for nested [quote] tags.
Details
(\[quote](?:(?1)|.)*?\[/quote])(*SKIP)(*F) - matches the pattern inside capturing parentheses and then (*SKIP)(*F) make the regex engine omit the matched text:
\[quote] - a literal [quote] string
(?:(?1)|.)*? - any 0+ (but as few as possible) occurrences of the whole Group 1 pattern ((?1)) or any char (.)
\[/quote] - a literal [/quote] string
| - or
#\w+ - a # followed with 1+ word chars.
PHP demo:
$results = [];
$string = "#friends #john [quote]#and #jane[/quote] #doe";
$rx = '~(\[quote\](?:(?1)|.)*?\[/quote])(*SKIP)(*F)|#\w+~s';
preg_match_all($rx, $string, $results);
print_r($results[0]);
// => Array ( [0] => #friends [1] => #john [2] => #doe )

PHP Regex to interpret a string as a command line attributes/options

let's say i have a string of
"Insert Post -title Some PostTitle -category 2 -date-posted 2013-02:02 10:10:10"
what i've been trying to do is to convert this string into actions, the string is very readable and what i'm trying to achieve is making posting a little bit easier instead of navigating to new pages every time. Now i'm okay with how the actions are going to work but i've had many failed attempts to process it the way i want, i simple want the values after the attributes (options) to be put into arrays, or simple just extract the values then ill be dealing with them the way i want.
the string above should give me an array of keys=>values, e.g
$Processed = [
'title'=> 'Some PostTitle',
'category'=> '2',
....
];
getting a processed data like this is what i'm looking for.
i've been tryin to write a regex for this but with no hope.
for example this:
/\-(\w*)\=?(.+)?/
that should be close enought to what i want.
note the spaces in title and dates, and that some value can have dashes as well, and maybe i can add a list of allowed attributes
$AllowedOptions = ['-title','-category',...];
i'm just not good at this and would like to have your help!
appreciated !
You can use this lookahead based regex to match your name-value pairs:
/-(\S+)\h+(.*?(?=\h+-|$))/
RegEx Demo
RegEx Breakup:
- # match a literal hyphen
(\S+) # match 1 or more of any non-whitespace char and capture it as group #1
\h+ # match 1 or more of any horizontal whitespace char
( # capture group #2 start
.*? # match 0 or more of any char (non-greedy)
(?=\h+-|$) # lookahead to assert next char is 1+ space and - or it is end of line
) # capture group #2 end
PHP Code:
$str = 'Insert Post -title Some PostTitle -category 2 -date-posted 2013-02:02 10:10:10';
if (preg_match_all('/-(\S+)\h+(.*?(?=\h+-|$))/', $str, $m)) {
$output = array_combine ( $m[1], $m[2] );
print_r($output);
}
Output:
Array
(
[title] => Some PostTitle
[category] => 2
[date-posted] => 2013-02:02 10:10:10
)

Wordpress get the parameter of the first shortcode in the content

I am writing a script to find the first occurrence of the following shortcode in content and then get the url parameter of the shortcode.
the shortcode looks like this
[soundcloud url="http://api.soundcloud.com/tracks/106046968"]
and what i have currently done is
$pattern = get_shortcode_regex();
$matches = array();
preg_match("/$pattern/s", get_the_content(), $matches);
print_r($matches);
and the result looks like
Array (
[0] => [soundcloud url="http://api.soundcloud.com/tracks/106046968"]
[1] =>
[2] => soundcloud
[3] => url="http://api.soundcloud.com/tracks/106046968"
[4] =>
[5] =>
[6] =>
)
Here is the string from which i need the url of the parameter of the shortcode
$html = 'Our good homies DJ Skeet Skeet aka Yung Skeeter & Wax Motif have teamed up to do a colossal 2-track EP and we\'re getting the exclusive sneak-premiere of the EP\'s diabolical techno b-side called "Hush Hush" before its released tomorrow on Dim Mak Records!
[soundcloud url="http://api.soundcloud.com/tracks/104477594"]
Wax Motif have teamed up to do a colossal 2-track EP and we\'re getting the exclusive sneak-premiere of the EP\'s diabolical techno b-side called "Hush Hush" before its released tomorrow on Dim Mak Records!
';
I guess this is not the best way to do it. If can guide me how we can do this then it would be great. Basically i want to extract the first occurrence of soundcloud url from the content.
So here's what I came up with:
preg_match('~\[soundcloud\s+url\s*=\s*("|\')(?<url>.*?)\1\s*\]~i', $input, $m); // match
print_r($m); // print matches (groups) ...
$url = isset($m['url']) ? $m['url']:''; // if the url doesn't exist then return empty string
echo 'The url is : ' . $url; // Just some output
Let's explain the regex:
~ # set ~ as delimiter
\[soundcloud # match [soundcloud
\s+ # match a whitespace 1 or more times
url # match url
\s* # match a whitespace 0 or more times
= # match =
\s* # match a whitespace 0 or more times
("|\') # match either a double quote or a single quote and put it in group 1
(?<url>.*?) # match everything ungreedy until group 1 is found and put it in a named group "url"
\1 # match what was matched in group 1
\s* # match a whitespace 0 or more times
\] # match ]
~ # delimiter (end expression)
i # set the i modifier, which means match case-insensitive
Online PHP demo
Online regex demo

"Optional" substring matching with regex

I am writing a regular expression in PHP that will need to extract data from strings that look like:
Naujasis Salemas, Šiaurės Dakota
Jungtinės Valstijos (Centras, Šiaurės Dakota)
I would like to extract:
Naujasis Salemas
Centras
For the first case, I have written [^-]*(?=,), which works quite well. I would like to modify the expression so that if there are parenthesis ( and ) , it should search between those parenthesis and then extract everything before the comma.
Is it possible to do something like this with just 1 expression? If so, how can I make it search within parenthesis if they exist?
A conditional might help you here:
$stra = 'Naujasis Salemas, Šiaurės Dakota';
$strb = 'Jungtinės Valstijos (Centras, Šiaurės Dakota)';
$regex = '
/^ # Anchor at start of string.
(?(?=.*\(.+,.*\)) # Condition to check for: presence of text in parenthesis.
.*\(([^,]+) # If condition matches, match inside parenthesis to first comma.
| ([^,]+) # Else match start of string to first comma.
)
/x
';
preg_match($regex, $stra, $matches) and print_r($matches);
/*
Array
(
[0] => Naujasis Salemas
[1] =>
[2] => Naujasis Salemas
)
*/
preg_match($regex, $strb, $matches) and print_r($matches);
/*
Array
(
[0] => Jungtinės Valstijos (Centras
[1] => Centras
)
*/
Note that the index in $matches changes slightly above, but you might be able to work around that using named subpatterns.
I think this one could do it:
[^-(]+(?=,)
This is the same regex as your, but it doesn't allow a parenthesis in the matched string. It will still match on the first subject, and on the second it will match just after the opening parenthesis.
Try it here: http://ideone.com/Crhzz
You could use
[^(),]+(?=,)
That would match any text except commas or parentheses, followed by a comma.

Categories