PHP / Regex : match json inside json - php

Just a quick regex question...hopefully
I have a string that looks something like this:
$string = 'some text [ something {"index":"{"index2":"value2"}"}] [something2 {"here to be":"more specific"}]';
I want to be able to get the value:
{"index":"{"index2":"value2"}"}
But all my attempts at matching (or replacing) keep giving me:
{"index":"{"index2":"value2"}
preg_replace('/\[(.*?)({.*?[^}]})*?\]/is', "", $string);
Here I'm matching the whole square bracket area, but hopefully you can see what I'm trying to do.
The negation of the "do not match }" doesn't seem to be doing anything. Maybe I just need an OR in there or something.
Well, thanks if you have time to answer.
The $string could contain multiple instances of the {} so a greedy regex won't work....that I know of.

You can't make a regex count the opening brackets and the corresponding closeing brackets, you should use a simple for loop to do that, but you can get the complete string from the first opening bracket to the last closeing one with a greedy expression like: ({.*}). Note that simple string functions are much faster then regular expressions, so you should use those instead.

Related

Extract text between brakets tags in php using Regex

I have the following content in a string (query from the DB), example:
$fulltext = "Thank you so much, {gallery}art-by-stephen{/gallery}. As you know I fell in love with it from the moment I saw it and I couldn’t wait to have it in my home!"
So I only want to extract what it is between the {gallery} tags, I'm doing the following but it does not work:
$regexPatternGallery= '{gallery}([^"]*){/gallery}';
preg_match($regexPatternGallery, $fulltext, $matchesGallery);
if (!empty($matchesGallery[1])) {
echo ('<p>matchesGallery: '.$matchesGallery[1].'</p>');
}
Any suggestions?
Try this:
$regexPatternGallery= '/\{gallery\}(.*)\{\/gallery\}/';
You need to escape / and { with a \ before it. And you where missing start and end / of the pattern.
http://www.phpliveregex.com/p/fn1
Similar to Andreas answer but differ in ([^"]*?)
$regexPatternGallery= '/\{gallery\}([^"]*?)\{\/gallery\}/';
Don't forget to put / at the beginning and the end of the Regex string. That's a must in PHP, different from other programming languages.
{,},/ are characters that can be confused as a Regex logic, so you have to escape it using \ like \{.
Use ? to make the string to non-greedy, thus saves memory. It avoids error when facing this kind of string "blabla {galery}you should only get this{/gallery} but you also got this instead.{/gallery} Rarely happens but be careful anyway".
Try this RegEx:
\{gallery\}(.*?)\{\/gallery\}
The problem with your RegEx was that you did not escape the / in the closing {gallery}. You also need to escape { and }.
You should use .*? for a lazy match, otherwise if there are 2 tags in one string, it will combine them. I.e. {gallery}by-joe{/gallery} and {gallery}by-tim{/gallery} would end up as:
by-joe{/gallery} and {gallery}by-tim
However, using a lazy match, you would get 2 results:
by-joe
by-tim
Live Demo on Regex101

PHP Regex: match text urls until space or end of string

This is the text sample:
$text = "asd dasjfd fdsfsd http://11111.com/asdasd/?s=423%423%2F gfsdf http://22222.com/asdasd/?s=423%423%2F
asdfggasd http://3333333.com/asdasd/?s=423%423%2F";
This is my regex pattern:
preg_match_all( "#http:\/\/(.*?)[\s|\n]#is", $text, $m );
That match the first two urls, but how do I match the last one? I tried adding [\s|\n|$] but that will also only match the first two urls.
Don't try to match \n (there's no line break after all!) and instead use $ (which will match to the end of the string).
Edit:
I'd love to hear why my initial idea doesn't work, so in case you know it, let me know. I'd guess because [] tries to match one character, while end of line isn't one? :)
This one will work:
preg_match_all('#http://(\S+)#is', $text, $m);
Note that you don't have to escape the / due to them not being the delimiting character, but you'd have to escape the \ as you're using double quotes (so the string is parsed). Instead I used single quotes for this.
I'm not familar with PHP, so I don't have the exact syntax, but maybe this will give you something to try. the [] means a character class so |$ will literally look for a $. I think what you'll need is another look ahead so something like this:
#http:\/\/(.*)(?=(\s|$))
I apologize if this is way off, but maybe it will give you another angle to try.
See What is the best regular expression to check if a string is a valid URL?
It has some very long regular expressions that will match all urls.

Regex to allow all characters except repeats of a particular given character

I've been fumbling with this for a bit and thought I'd put it up to the regex experts:
I want to match strings like this:
abc[abcde]fff
abcffasd
so I want to allow single brackets (e.g. [ or ]). However, I don't want to allow double brackets in sequence (e.g. [[ or ]]).
This means this string shouldn't pass the regex:
abc[abcde]fff[[gg]]
My best guess so far is based on an example I found, something like:
(?>[a-zA-Z\[\]']+)(?!\[\[)
However, this doesn't work (it matches even when double brackets are present), presumably because the brackets are contained in the first part as well.
You want something like:
^(?:\[?[^\[]|\[$)*$
At each character, the pattern accepts an opening bracket followed by another character, or the end of the string.
Or a little more neatly, using a negative lookahead:
^(?:(?!\[\[).)*$
Here, the pattern will only match characters as long as it doesn't see two [[ ahead.
Not to be deterred!
^(?:(?:[a-z]+)|(?:\](?!\]))|(?:\[(?!\[)))+$
I removed the only two or more thing. I removed the redundant character classes for only one characters. This seems to pass all test cases I can think of. Any string of characters containing only single [ or ].
Let me know if it works for you!
I'm not sure I can answer this, but I'll post what I have as I'm going through it.
First, I have this which seems to match without the brackets. This is any letter not follwed by 2 or more of itself.
^(?:([a-z])(?!\1{2,}))+$
We can add the brackets into the character class and it will start matching brackets; but, obviously it will also allow them to follow the same rules as the letters (two together is valid). How do we separate the bracket behavior from the letter behavior?
^(?:([a-z\[\]])(?!\1{2,}))+$
This feels dirty, but seems to work. Looking at the other answer, I like that a lot better. Now to figure out why I didn't think of it.
^(?:(?:([a-z])(?!\1{2,}))|(?:[\]](?![\]]))|(?:[\[](?![\[])))+$
Also, for some reason I thought it was 1-2 of each character but only one of [ and ] so this is all worthless anyway :).
You can try this negative lookahead:
$arr = array('abc[abcde]fff', 'abcffasd', 'abc[abcde]fff[[gg]]');
foreach ($arr as $str) {
echo $str,' => ';
$ret = preg_match('/^(?!.*?(\[\[)).+$/', $str, $m);
echo "$ret\n";
}
OUTPUT
abc[abcde]fff => 1
abcffasd => 1
abc[abcde]fff[[gg]] => 0
This regex should allow all letters and brackets except two consecutive brackets (i.e. [], [[ or ]])
([a-zA-Z\[\]][a-zA-Z])+
EDIT: Sorry, this won't work for strings with odd length

Parse for square brackets with regular expressions

I've always had a difficult time with regular expressions. I've searched for help with this, but I can't quite find what I'm looking for.
I have blocks of text that follow this pattern:
[php]
... any type of code sample here
[/php]
I need to:
check for the square brackets, which can contain any number of 20-30 programming language names (php, ruby, etc.).
need to grab all code in between the opening and closing bracket.
I have worked out the following regular expression:
#\[([a-z]+)\]([^\[/]*)\[/([a-z]+)\]#i
Which matches everything pretty well. However, it breaks when the code sample contains square brackets. How do I modify it so that any character between those opening/closing braces will be matched for later use?
This is the regex you want. It matches where the tags are even too, so a php tag will only end a php tag.
/\[(\w+)\](.*?)\[\/\1\]/s
Or if you wanted to explicitly match the tags you could use...
$langs = array('php', 'python', ...);
$langs = implode('|', array_map('preg_quote', $langs));
preg_match_all('/\[(' . $langs . ')\](.*?)\[\/\1\]/s', $str, $matches);
The following will work:
\[([a-z]+)\].*\[/\1\]
If you don't want to remove the greediness, you can do:
\[([a-z]+)\].*?\[/\1\]
All you have to do is to check that both the closing and opening tags have the same text (in this case, that both are the same programming language), and you do that with \1, telling it to match the previously matched Group number 1: ([a-z]+)
Why don't you use something like below:
\[php\].*?\[/php\]
I don't understand why you want to use [a-z]+ for the tags, there should be php or a limited amount of other tags. Just keep it simple.
Actually you can use:
\[(php)\].*?\[/(\1)\]
so that you can match the opening and closing tags. Otherwise you will be matching random opening and closing. Add others like, I don't know, js etc as php|js etc.
Use a backreference to refer to a match already made in the regular expression:
\[(\w+)\].*?\[/\1\]

regex to remove all whitespaces except between brackets

I've been wrestling with an issue I was hoping to solve with regex.
Let's say I have a string that can contain any alphanumeric with the possibility of a substring within being surrounded by square brackets. These substrings could appear anywhere in the string like this. There can also be any number of bracket-ed substrings.
Examples:
aaa[bb b]
aaa[bbb]ccc[d dd]
[aaa]bbb[c cc]
You can see that there are whitespaces in some of the bracketed substrings, that's fine. My main issue right now is when I encounter spaces outside of the brackets like this:
a aa[bb b]
Now I want to preserve the spaces inside the brackets but remove them everywhere else.
This gets a little more tricky for strings like:
a aa[bb b]c cc[d dd]e ee[f ff]
Here I would want the return to be:
aaa[bb b]ccc[d dd]eee[f ff]
I spent some time now reading through different reg ex pages regarding lookarounds, negative assertions, etc. and it's making my head spin.
NOTE: for anyone visiting this, I was not looking for any solution involving nested brackets. If that was the case I'd probably do it pragmatically like some of the comments mentioned below.
This regex should do the trick:
[ ](?=[^\]]*?(?:\[|$))
Just replace the space that was matched with "".
Basically all it's doing is making sure that the space you are going to remove has a "[" in front of it, but not if it has a "]" before it.
That should work as long as you don't have nested square brackets, e.g.:
a a[b [c c]b]
Because in that case, the space after the first "b" will be removed and it will become:
aa[b[c c]b]
This doesn't sound like something you really want regex for. It's very easy to parse directly by reading through. Pseudo-code:
inside_brackets = false;
for ( i = 0; i < length(str); i++) {
if (str[i] == '[' )
inside_brackets = true;
else if str[i] == ']'
inside_brackets = false;
if ( ! inside_brackets && is_space(str[i]) )
delete(str[i]);
}
Anything involving regex is going to involve a lot of lookbehind stuff, which will be repeated over and over, and it'll be much slower and less comprehensible.
To make this work for nested brackets, simply change inside_brackets to a counter, starting at zero, incrementing on open brackets, and decrementing on close brackets.
This works for me:
(\[.+?\])|\s
Then you simply pass in a replacement value of $1 when you call the replace function. The idea is to look for the patterns inside the brackets first and make sure they're untouched. And then every space outside the brackets gets replaced with nothing.
Note that I tested this with Regex Hero (a .NET regex tester), and not in PHP. So I'm not 100% sure this will work for you.
That was an interesting one. Sounded simple at first, then seemed rather difficult. And then the solution I finally arrived at was indeed simple. I was surprised the solution didn't require a lookaround of any sort. And it should be faster than any method that uses a lookaround.
How to do this depends on what should be done with:
a b [ c [ d [ e ] f ] g
That is ambiguous; possible answers are at least:
ab[ c [ d [ e ] f ]g
ab[ c [ d [ e ]f]g
error out; the brackets don't match!
For the first two cases, you can use regexps. For the third case, you'd be much better off with a (small) parser.
For either case one or two, split the string on the first [. Strip spaces from everything before [ (that's obviously outside of the brackets). Next, look for .*\] (case 1) or .*?\] (case 2) and move that over to your output. Repeat until you're out of input.
Resurrecting this question because it had a simple solution that wasn't mentioned.
\[[^]]*\](*SKIP)(*F)|\s+
The left side of the alternation matches complete sets of brackets then deliberately fails. The right side matches and captures spaces to Group 1, and we know they are the right spaces because if they were within brackets they would have been failed by the expression on the left.
See the matches in this demo
This means you can just do
$replace = preg_replace("~\[[^]]*\](*SKIP)(*F)|\s+~","",$string);
Reference
How to match pattern except in situations s1, s2, s3
How to match a pattern unless...
The following will match start-of-line or end-of-bracket (which must come before any space you want to match) followed by anything that isn't start-of-bracket or a space, followed by some space.
/((^|\])[^ \[]*) +/
replacing "all" with $1 will remove the first block of spaces from each non-bracketed sequence. You will have to repeat the match to remove all spaces.
Example:
abcd efg [hij klm]nop qrst u
abcdefg [hij klm]nopqrst u
abcdefg[hij klm]nopqrstu
done

Categories