php: brackets/contents from a array? - php

If I have a string like this:
$str = '[tr]Kapadokya[/tr][en]Cappadocia[/en][de]Test[/de]';
I want that
$array = array(
'tr' => 'Kapadokya',
'en' => 'Cappadocia',
'de' => 'Test');
How do I do this?

With a few assumptions about the actual syntax of your BBCode-ish string the following (pc) regular expression might suffice.
<?php
$str = '[tr]Kapadokya[/tr][en]Cappadocia[/en][de]Test[/de]';
$pattern = '!
\[
([^\]]+)
\]
(.+)
\[
/
\\1
\]
!x';
/* alternative, probably better expression (see comments)
$pattern = '!
\[ (?# pattern start with a literal [ )
([^\]]+) (?# is followed by one or more characters other than ] - those characters are grouped as subcapture #1, see below )
\] (?# is followed by one literal ] )
( (?# capture all following characters )
[^[]+ (?# as long as not a literal ] is encountered - there must be at least one such character )
)
\[ (?# pattern ends with a literal [ and )
/ (?# literal / )
\1 (?# the same characters as between the opening [...] - that's subcapture #1 )
\] (?# and finally a literal ] )
!x'; // the x modifier allows us to make the pattern easier to read because literal white spaces are ignored
*/
preg_match_all($pattern, $str, $matches);
var_export($matches);
prints
array (
0 =>
array (
0 => '[tr]Kapadokya[/tr]',
1 => '[en]Cappadocia[/en]',
2 => '[de]Test[/de]',
),
1 =>
array (
0 => 'tr',
1 => 'en',
2 => 'de',
),
2 =>
array (
0 => 'Kapadokya',
1 => 'Cappadocia',
2 => 'Test',
),
)
see also: http://docs.php.net/pcre

Related

regexp monetary strings with decimals and thousands separator

https://www.tehplayground.com/KWmxySzbC9VoDvP9
Why is the first string matched?
$list = [
'3928.3939392', // Should not be matched
'4.239,99',
'39',
'3929',
'2993.39',
'393993.999'
];
foreach($list as $str){
preg_match('/^(?<![\d.,])-?\d{1,3}(?:[,. ]?\d{3})*(?:[^.,%]|[.,]\d{1,2})-?(?![\d.,%]|(?: %))$/', $str, $matches);
print_r($matches);
}
output
Array
(
[0] => 3928.3939392
)
Array
(
[0] => 4.239,99
)
Array
(
[0] => 39
)
Array
(
[0] => 3929
)
Array
(
[0] => 2993.39
)
Array
(
)
You seem to want to match the numbers as standalone strings, and thus, you do not need the lookarounds, you only need to use anchors.
You may use
^-?(?:\d{1,3}(?:[,. ]\d{3})*|\d*)(?:[.,]\d{1,2})?$
See the regex demo
Details
^ - start of string
-? - an optional -
(?: - start of a non-capturing alternation group:
\d{1,3}(?:[,. ]\d{3})* - 1 to 3 digits, followed with 0+ sequences of ,, . or space and then 3 digits
| - or
\d* - 0+ digits
) - end of the group
(?:[.,]\d{1,2})? - an optional sequence of . or , followed with 1 or 2 digits
$ - end of string.

preg_match_all split conditional expression

I have data in this format:
Randomtext1(random2, random4) Randomtext2 (ran dom) Randomtext3 Randomtext4 (random5,random7,random8) Randomtext5 (Randomtext4 (random5,random7,random8), random10) Randomtext11()
with this:
preg_match_all("/\b\w+\b(?:\s*\(.*?\)|)/",$text,$matches);
I obtain:
0 => 'Randomtext1(random2, random4)',
1 => 'Randomtext2 (ran dom)',
2 => 'Randomtext3',
3 => 'Randomtext4 (random5,random7,random8)',
4 => 'Randomtext5 (Randomtext4 (random5,random7,random8)',
5 => 'random10',
6 => 'Randomtext11()',
but I want
0 => 'Randomtext1(random2, random4)',
1 => 'Randomtext2 (ran dom)',
2 => 'Randomtext3',
3 => 'Randomtext4 (random5,random7,random8)'
4 => 'Randomtext5 (Randomtext4 (random5,random7,random8), random10)'
5 => 'Randomtext11()'
Any ideas?
You need a recursive pattern to handle nested parenthesis:
if ( preg_match_all('~\w+(?:\s*(\([^()]*+(?:(?1)[^()]*)*+\)))?~', $text, $matches) )
print_r($matches[0]);
demo
details:
~ # delimiter
\w+
(?:
\s*
( # capture group 1
\(
[^()]*+ # all that isn't a round bracket
# (possessive quantifier *+ to prevent too many backtracking
# steps in case of badly formatted string)
(?:
(?1) # recursion in the capture group 1
[^()]*
)*+
\)
) # close the capture group 1
)? # to make the group optional (instead of "|)")
~
Note that you don't need to add word-boundaries around \w+

Capture all occurrences of repeated formatted substrings

I've a string that follows this pattern [:it]Stringa in italiano[:en]String in english.
I'm trying to use preg_match_all() to capture the locales and the associated strings, ie:
[1] => 'it',
[2] => 'en',
...
[1] => 'Stringa in italiano',
[2] => 'String in english'
The regex that I'm using "/\[:(\w+)](.+?)(?=\[:\w+])/" (https://regex101.com/r/eZ1gT7/400) returns only the first group of data. What I'm doing wrong?
The final formatted segment will not satisfy your lookahead. You will need to include the option of match the position of the end of the string with an alternation. A pipe (|) means "or". A dollar symbol ($) means "end of string".
I am using negated character classes to match between literal square braces. If your \w is sufficient for your project, feel free to keep that portion as you originally posted.
Code: (Demo)
$string = '[:it]Stringa in italiano[:en]String in english';
preg_match_all('~\[:([^]]+)](.+?)(?=$|\[:[^]]+])~', $string, $m);
var_export($m);
Output:
array (
0 =>
array (
0 => '[:it]Stringa in italiano',
1 => '[:en]String in english',
),
1 =>
array (
0 => 'it',
1 => 'en',
),
2 =>
array (
0 => 'Stringa in italiano',
1 => 'String in english',
),
)

Extracting some content from given string

This is the content piece:
This is content that is a sample.
[md] Special Content Piece [/md]
This is some more content.
What I want is a preg_match_all expression such that it can fetch and give me the following from the above content:
[md] Special Content Piece [/md]
I have tried this:
$pattern ="/\[^[a-zA-Z][0-9\-\_\](.*?)\[\/^[a-zA-Z][0-9\-\_]\]/";
preg_match_all($pattern, $content, $matches);
But it gives a blank array. Could someone help?
$pattern = "/\[md\](.*?)\[\md\]/";
generally
$pattern = "/\[[a-zA-Z0-9\-\_]+\](.*?)\[\/[a-zA-Z0-9\-\_]+\]/";
or even better
$pattern = "/\[\w+\](.*?)\[\/\w+\]/";
and to match the start tag with the end tag:
$pattern = "/\[(\w+)\](.*?)\[\/\1\]/";
(Just note that the "tag" name is then returned in the match array.)
You can use this:
$pattern = '~\[([^]]++)]\K[^[]++(?=\[/\1])~';
explanation:
~ #delimiter of the pattern
\[ #literal opening square bracket (must be escaped)
( #open the capture group 1
[^]]++ #all characters that are not ] one or more times
) #close the capture group 1
] #literal closing square bracket (no need to escape)
\K #reset all the match before
[^[]++ #all characters that are not [ one or more times
(?= #open a lookahead assertion (this doesn't consume characters)
\[/ #literal opening square bracket and slash
\1 #back reference to the group 1
] #literal closing square bracket
) #close the lookhead
~
Interest of this pattern:
The result is the whole match because i have reset all the match before \K and because the lookahead assertion, after what you are looking for, don't consume characters and is not in the match.
The character classes are defined in negative and therefore are shorter to write and permissive (you don't care about what characters must be inside)
The pattern checks if the opening and closing tags are the same with the system of capture group\back reference.
Limits:
This expression don't deal with nested structures (you don't ask for). If you need that, please edit your question.
For nested structures you can use:
(?=(\[([^]]++)](?<content>(?>[^][]++|(?1))*)\[/\2]))
If attributes are allowed in your bbcode:
(?=(\[([^]\s]++)[^]]*+](?<content>(?>[^][]++|(?1))*)\[/\2]))
If self-closing bbcode tags are allowed:
(?=((?:\[([^][]++)](?<content>(?>[^][]++|(?1))*)\[/\2])|\[[^/][^]]*+]))
Notes:
A lookahead means in other words: "followed by"
I use possessive quantifiers (++) instead of simple gready quantifiers (+) to inform the regex engine that it doesn't need to backtrack (gain of performance) and atomic groups (ie:(?>..)) for the same reasons.
In the patterns for nested structures slashes are not escaped, to use them you must choose a delimiter that is not a slash (~, #, `).
The patterns for nested structures use recursion (ie (?1)), you can have more informations about this feature here and here.
Update:
If you're likely to be working with nested "tags", I'd probably go for something like this:
$pattern = '/(\[\s*([^\]]++)\s*\])(?=(.*?)(\[\s*\/\s*\2\s*\]))/';
Which, as you probably can tell, is not unlike what CasimiretHippolyte suggested (only his regex, AFAIKT, won't capture outer tags in a scenario like the following:)
his is content that is a sample.
[md] Special Content [foo]Piece[/foo] [/md]
This is some more content.
Whereas, with this expression, $matches looks like:
array (
0 =>
array (
0 => '[md]',
1 => '[foo]',
),
1 =>
array (
0 => '[md]',
1 => '[foo]',
),
2 =>
array (
0 => 'md',
1 => 'foo',
),
3 =>
array (
0 => ' Special Content [foo]Piece[/foo] ',
1 => 'Piece',
),
4 =>
array (
0 => '[/md]',
1 => '[/foo]',
),
)
A rather simple pattern to match all substrings looking like this [foo]sometext[/foo]
$pattern = '/(\[[^\/\]]+\])([^\]]+)(\[\s*\/\s*[^\]]+\])/';
if (preg_match_all($pattern, $content, $matches))
{
echo '<pre>';
print_r($matches);
echo '</pre>';
}
Output:
array (
0 =>
array (
0 => '[md] Special Content Piece [/md]',
),
1 =>
array (
0 => '[md]',
),
2 =>
array (
0 => ' Special Content Piece ',
),
3 =>
array (
0 => '[/md]',
),
)
How this pattern works: It's devided into three groups.
The first: (\[[^\/\]]+\]) matches opening and closing [], with everything inbetween that is neither a closing bracket nor a forward slash.
The second: '([^]]+)' matches every char after the first group that is not [
The third: (\[\s*\/\s*[^\]]+\]) matches an opening [, followed by zero or more spaces, a forward slash, again followed by zero or more spaces, and any other char that isn't ]
If you want to match a specific end-tag, but keeping the same three groups (with a fourth), use this (slightly more complex) expression:
$pattern = '/(\[\s*([^\]]+?)\s*\])(.+?)(\[\s*\/\s*\2\s*\])/';
This'll return:
array (
0 =>
array (
0 => '[md] Special Content Piece [/md]',
),
1 =>
array (
0 => '[md]',
),
2 =>
array (
0 => 'md',
),
3 =>
array (
0 => ' Special Content Piece ',
),
4 =>
array (
0 => '[/md]',
),
)
Note that group 2 (the one we used in the expression as \2) is the "tagname" itself.

Regexp tip request

I have a string like
"first,second[,b],third[a,b[1,2,3]],fourth[a[1,2]],sixth"
I want to explode it to array
Array (
0 => "first",
1 => "second[,b]",
2 => "third[a,b[1,2,3]]",
3 => "fourth[a[1,2]]",
4 => "sixth"
}
I tried to remove brackets:
preg_replace("/[ ( (?>[^[]]+) | (?R) )* ]/xis",
"",
"first,second[,b],third[a,b[1,2,3]],fourth[a[1,2]],sixth"
);
But got stuck one the next step
PHP's regex flavor supports recursive patterns, so something like this would work:
$text = "first,second[,b],third[a,b[1,2,3]],fourth[a[1,2]],sixth";
preg_match_all('/[^,\[\]]+(\[([^\[\]]|(?1))*])?/', $text, $matches);
print_r($matches[0]);
which will print:
Array
(
[0] => first
[1] => second[,b]
[2] => third[a,b[1,2,3]]
[3] => fourth[a[1,2]]
[4] => sixth
)
The key here is not to split, but match.
Whether you want to add such a cryptic regex to your code base, is up to you :)
EDIT
I just realized that my suggestion above will not match entries starting with [. To do that, do it like this:
$text = "first,second[,b],third[a,b[1,2,3]],fourth[a[1,2]],sixth,[s,[,e,[,v,],e,],n]";
preg_match_all("/
( # start match group 1
[^,\[\]] # any char other than a comma or square bracket
| # OR
\[ # an opening square bracket
( # start match group 2
[^\[\]] # any char other than a square bracket
| # OR
(?R) # recursively match the entire pattern
)* # end match group 2, and repeat it zero or more times
] # an closing square bracket
)+ # end match group 1, and repeat it once or more times
/x",
$text,
$matches
);
print_r($matches[0]);
which prints:
Array
(
[0] => first
[1] => second[,b]
[2] => third[a,b[1,2,3]]
[3] => fourth[a[1,2]]
[4] => sixth
[5] => [s,[,e,[,v,],e,],n]
)

Categories