regular expression end tag = start tag - php

Take a look at this regular expression:
(?:\(?")(.+)(?:"\)?)
This regex would match e.g
"a"
("a")
but also
"a)
How can I say that the starting character [ in this case " or ) ] is the same as the ending character? There must be a simplier solution than this, right?
"(.+)"|(?:\(")(.+)(?:"\))

I don't think there's a good way to do this specifically with regex, so you are stuck doing something like this:
/(?:
"(.+)"
|
\( (.+) \)
)/x

how about:
(\(?)(")(.+)\2\1
explanation:
(?-imsx:(\(?)(")(.+)\2\1)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
\(? '(' (optional (matching the most amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
.+ any character except \n (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
\2 what was matched by capture \2
----------------------------------------------------------------------
\1 what was matched by capture \1
----------------------------------------------------------------------
) end of grouping

You can use Placeholders in PHP. But note, that this is not normal Regex behaviour, its special to PHP.:
preg_match("/<([^>]+)>(.+)<\/\1>/") (the \1 references the outcome of the first match)
This will use the first match as condition for the closing match. This matches <a>something</a> but not <h2>something</a>
However in your case you would need to turn the "(" matched within the first group into a ")" - which wont work.
Update: replacing ( and ) to <BRACE> AND <END_BRACE>. Then you can match using /<([^>]+)>(.+)<END_\1>/. Do this for all Required elements you use: ()[]{}<> and whatevs.
(a) is as nice as [f] will become <BRACE>a<END_BRACE> is as nice as <BRACKET>f<END_BRACKET> and the regex will capture both, if you use preg_match_all
$returnValue = preg_match_all('/<([^>]+)>(.+)<END_\\1>/', '<BRACE>a<END_BRACE> is as nice as <BRACKET>f<END_BRACKET>', $matches);
leads to
array (
0 =>
array (
0 => '<BRACE>a<END_BRACE>',
1 => '<BRACKET>f<END_BRACKET>',
),
1 =>
array (
0 => 'BRACE',
1 => 'BRACKET',
),
2 =>
array (
0 => 'a',
1 => 'f',
),
)

Related

PHP - Regex match curly brackets within other regex expression

I am trying to figure out how to match other parts of the stuff I need but can't seem to get it to work.
This is what I have so far:
preg_match_all("/^(.*?)(?:.\(([\d]+?)[\/I^\(]*?\))(?:.\((.*?)\))?/m",$data,$r, PREG_SET_ORDER);
Example text:
INPUT - Each line represents a line inside a text file.
-------------------------------------------------------------------------------------
"!?Text" (1234) 1234-4321
"#1 Text" (1234) 1234-????
#2 Text (1234) {Some text (#1.1)} 1234
Text (1234) 1234
Some Other Text: More Text here 1234-4321 (1234) (V) 1234
What I want to do:
I want to also match things in curly brackets and stuff in brackets of curly brackets.
I can't seem to get it to work considering that things in curly brackets + brackets may not always be within the line.
Essentially first (1234) will be a year and I only want to match it once, however in the last string example it also matches (V) but I don't want it to.
Desirable output:
Array
(
[0] => "!?Text" (1234)
[1] => "!?Text"
[2] => 1234
)
Array
(
[0] => "#1 Text" (1234)
[1] => "#1 Text"
[2] => 1234
)
Array
(
[0] => "#2 Text" (1234)
[1] => "#2 Text"
[2] => 1234
[3] => Some text (#1.1) // Matches things within curly brackets if there are any.
[4] => Some text // Extracts text before brackets
[5] => #1.1 // Extracts text within brackets (if any because brackets may not be within curly brackets.)
)
Array
(
[0] => Text (1234)
[1] => Text
[2] => 1234
)
Array // (My current regular expression gives me a 4th match with value 'V', which it shouldn't do)
(
[0] => Some Other Text: More Text here 1234-4321 (1234) (V)
[1] => Some Other Text: More Text here 1234-4321
[2] => 1234
)
What about using:
^((.*?) *\((\d+)\))(?: *\{((.*?) *\((.+?)\)) *\})?
DEMO
NODE EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more
times (matching the least amount
possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
* ' ' (0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
\( '('
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
' '
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
\) ')'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
* ' ' (0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
\{ '{'
--------------------------------------------------------------------------------
( group and capture to \4:
--------------------------------------------------------------------------------
( group and capture to \5:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more
times (matching the least amount
possible))
--------------------------------------------------------------------------------
) end of \5
--------------------------------------------------------------------------------
* ' ' (0 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
\( '('
--------------------------------------------------------------------------------
( group and capture to \6:
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
? ' ' (optional (matching the most
amount possible))
--------------------------------------------------------------------------------
) end of \6
--------------------------------------------------------------------------------
\) ')'
--------------------------------------------------------------------------------
) end of \4
--------------------------------------------------------------------------------
* ' ' (0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
\} '}'
--------------------------------------------------------------------------------
)? end of grouping

Regular expression match, extracting only wanted segments of string

I am trying to extract three segments from a string. As I am not particularly good with regular expressions, I think what I have done could probably be done better.
I would like to extract the bold parts of the following string:
SOMETEXT: ANYTHING_HERE (Old=ANYTHING_HERE,
New=ANYTHING_HERE)
Some examples could be:
ABC: Some_Field (Old=,New=123)
ABC: Some_Field (Old=ABCde,New=1234)
ABC: Some_Field (Old=Hello World,New=Bye Bye World)
So the above would return the following matches:
$matches[0] = 'Some_Field';
$matches[1] = '';
$matches[2] = '123';
So far I have the following code:
preg_match_all('/^([a-z]*\:(\s?)+)(.+)(\s?)+\(old=(.+)\,(\s?)+new=(.+)\)/i',$string,$matches);
The issue with the above is that it returns a match for each separate segment of the string. I do not know how to ensure the string is the correct format using a regular expression without catching and storing the match if that makes sense?
So, my question, if not already clear, how I can retrieve just the segments that I want from the above string?
You don't need preg_match_all. You can use this preg_match call:
$s = 'SOMETEXT: ANYTHING_HERE (Old=ANYTHING_HERE1, New=ANYTHING_HERE2)';
if (preg_match('/[^:]*:\s*(\w*)\s*\(Old=(\w*),\s*New=(\w*)/i', $s, $arr))
print_r($arr);
OUTPUT:
Array
(
[0] => SOMETEXT: ANYTHING_HERE (Old=ANYTHING_HERE1, New=ANYTHING_HERE2
[1] => ANYTHING_HERE
[2] => ANYTHING_HERE1
[3] => ANYTHING_HERE2
)
if(preg_match_all('/([a-z]*)\:\s*.+\(Old=(.+),\s*New=(.+)\)/i',$string,$matches)) {
print_r($matches);
}
Example:
$string = 'ABC: Some_Field (Old=Hello World,New=Bye Bye World)';
Will match:
Array
(
[0] => Array
(
[0] => ABC: Some_Field (Old=Hello World,New=Bye Bye World)
)
[1] => Array
(
[0] => ABC
)
[2] => Array
(
[0] => Hello World
)
[3] => Array
(
[0] => Bye Bye World
)
)
The problem is that you're using more parenthesis than you need, and thus capturing more segments of the input than you wish.
eg, each (\s?)+ segment should just be \s*
The regex that you're looking for is:
[^:]+:\s*(.+)\s*\(old=(.*)\s*,\s*new=(.*)\)
In PHP:
preg_match_all('/[^:]+:\s*(.+)\s*\(old=(.*)\s*,\s*new=(.*)\)/i',$string,$matches);
A useful tool can be found here: http://www.myregextester.com/index.php
This tool offers an "Explain" checkbox (as well as a "PHP" checkbox and "i" flag checkbox which you'll want to select) which provides a full explanation of the regex as well. For posterity, I've included the explanation below as well:
NODE EXPLANATION
----------------------------------------------------------------------
(?i-msx: group, but do not capture (case-insensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
[^:]+ any character except: ':' (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
.+ any character except \n (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
\( '('
----------------------------------------------------------------------
old= 'old='
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
, ','
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
new= 'new='
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
\) ')'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
What about something simpler like ^_^
[:=]\s*([\w\s]*)
Live DEMO
:\s*([^(\s]+)\s*\(Old=([^,]*),New=([^)]*)
Live demo
also please tell if you want explanations.

get substrings from string with parentheses, brackets and hyphen

Regex is not my strongest suit and I'm having a bit of trouble with this situation.
I have the following string:
locale (district - town) [parish]
I need to extract the following information:
1 - locale
2 - district
3 - town
And I have these solutions:
1 - locale
preg_match("/([^(]*)\s/", $input_line, $output_array);
2 - district
preg_match("/.*\(([^-]*)\s/", $input_line, $output_array);
3 - town
preg_match("/.*\-\s([^)]*)/", $input_line, $output_array);
And these seem to work fine.
However, the string may be presented like any of these:
localeA(localeB) (district - town) [parish]
locale (district - townA(townB)) [parish]
locale (district - townA-townB) [parish]
Locale can also include parentheses of its own.
Town can include parentheses and/or an hyphen of its own.
Which makes it difficult to extract the right information. In the 3 scenarios above I would have to extract:
localeA(localeB) + district + town
locale + district + townA(townB)
locale + district + townA-townB
I find it hard to deal with all these scenarios. Can you help me out?
Thanks in advance
If locale, district and town haven't spaces in them:
preg_match("/^\s*(\S+)\s*\((\S+)\s*-\s*(\S+)\)/", $input_line, $output_array);
explanation:
The regular expression:
(?-imsx:^\s*(\S+)\s*\((\S+)\s*-\s*(\S+)\))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
\S+ non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times (matching the
most amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
\( '('
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
\S+ non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times (matching the
most amount possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
- '-'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
\S+ non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times (matching the
most amount possible))
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
\) ')'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
Not sure what exactly your rules and edge cases are, but this works for the examples provided
preg_match('#^(.+?) \((.+?) - (.+?)\) \[(.+)\]$#',$str,$matches);
Gives these results (when run for each example string in $str):
Array
(
[0] => locale (district - town) [parish]
[1] => locale
[2] => district
[3] => town
[4] => parish
)
Array
(
[0] => localeA(localeB) (district - town) [parish]
[1] => localeA(localeB)
[2] => district
[3] => town
[4] => parish
)
Array
(
[0] => locale (district - townA(townB)) [parish]
[1] => locale
[2] => district
[3] => townA(townB)
[4] => parish
)
Array
(
[0] => locale (district - townA-townB) [parish]
[1] => locale
[2] => district
[3] => townA-townB
[4] => parish
)

get substring between 2 characters in php

Im using a mentioning system like on twitter and instagram where you simply put #johndoe
what im trying to do is be able to strip down to the name in-between "#" and these characters ?,,,],:,(space)
as an example heres my string:
hey #johnDoe check out this event, be sure to bring #janeDoe:,#johnnyappleSeed?, #johnCitizen] , and #fredNerk
how can i get an array of janeDoe,johnnyappleSeed,johnCitizen,fredNerk without the characters ?,,,],: attached to them.
i know i have to use a variation of preg_match but i dont have a strong understanding of it.
This is what you've asked for: /\#(.*?)\s/
This is what you really want: /\b\#(.*?)\b/
Put either one into preg_match_all() and evaluate the results array.
preg_match_all("/\#(.*?)\s/", $string, $result_array);
$check_hash = preg_match_all ("/#[a-zA-Z0-9]*/g", $string_to_match_against, $matches);
You could then do somthing like
foreach ($matches as $images){
echo $images."<br />";
}
UPDATE: Just realized you were looking to remove the invalid characters. Updated script should do it.
How about:
$str = 'hey #johnDoe check out this event, be sure to bring #janeDoe:,#johnnyappleSeed?, #johnCitizen] , and #fredNerk';
preg_match_all('/#(.*?)(?:[?, \]: ]|$)/', $str, $m);
print_r($m);
output:
Array
(
[0] => Array
(
[0] => #johnDoe
[1] => #janeDoe:
[2] => #johnnyappleSeed?
[3] => #johnCitizen]
[4] => #fredNerk
)
[1] => Array
(
[0] => johnDoe
[1] => janeDoe
[2] => johnnyappleSeed
[3] => johnCitizen
[4] => fredNerk
)
)
explanation:
The regular expression:
(?-imsx:#(.*?)(?:[?, \]: ]|$))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
# '#'
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
[?, \]: ] any character of: '?', ',', ' ', '\]',
':', ' '
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
$ before an optional \n, and the end of
the string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

Little help with regex

how can I match these:
(1, 'asd', 'asd2')
but not match this:
(1, '(data)', 0)
I want to match the ( and ), but not match ( and ) inside ( and ).
Actually these are queries and I want to split them via preg_split.
/[\(*\)]+/
splits them, but also splits ( and ) inside them, how can I fix this?
Example:
The data is:
(1, 'user1', 1, 0, 0, 0)(2, 'user(2)', 1, 0, 0, 1)
I want to split them as:
Array(
0 => (1, 'user1', 1, 0, 0, 0)
1 => (2, 'user(2)', 1, 0, 0, 1)
);
instead of it, its splitted as:
Array(
0 => (1, 'user1', 1, 0, 0, 0)
1 => (2, 'user
2 => 2
3 => ', 1, 0, 0, 1)
);
A regex for this would be a little nasty. Instead, you can iterate over the entire string and decide where to split:
If it's a ), split there. (I'm assuming the brackets are balanced in the string and can't be nested)
If it's a ', ignore any ) until a closing ' (If it can be escaped, you can look at the previous characters for an odd number of \).
I think this is a more straight-forward solution than a regex.
You can't use preg_split for that (as you don't match borders, but lengthier patterns). But it might be possible with a preg_match_all:
preg_match_all(':\( ((?R) | .)*? \):x', $source, $matches);
print_r($matches[0]);
Instead of a ?R recursive version, you could also just prepare the pattern for a single level of internal parenthesis. But that wouldn't look much simpler actually.
:\( ( [^()]* | \( [^()]* \) )+ \):x
Your grammar appears to be
list: '(' num ( ',' term )(s?) ')'
term: num | str
num: /[0-9]+/
str: /'[^']*'/
So the pattern is
/ \G \s* \( \s* [0-9]+ (?: \s* , \s* (?: [0-9]+ | '[^']*' ) )* \s* \) /x
Well, that's just for matching. Extraction is tricker if PHP works like Perl. If you want to do with with regex match, you have to do it in two passes.
First you extract the list:
/ \G \s* \( \s* ( [0-9]+ (?: \s* , \s* (?: [0-9]+ | '[^']*' ) )* ) \s* \) /x
Then you extract the terms from the list:
/ \G \s* ( [0-9]+ | '[^']*' ) (?: \s* , )? /x

Categories