Regex to find information in PHP function

Regex to find information in PHP function - php

I'm working on a regex but I'm not able to fix it.
I'm scanning documents (.php) with PHP and I'm looking for: $__this('[TEXT]') or $__this("[TEXT]")
So my question is: can somebody help me with a regex that searches in a string for: $__this('[TEXT]') or $__this("[TEXT]") and gives me [TEXT]
UPDATE (with answer, thanks to #Explosion Pills):
$string = '$__this("Foo Bar<br>HelloHello")';
preg_match('/\$__this\(([\'"])(.*?)\1\)/xi', $string, $matches);
print_r($matches);

preg_match('/
\$__this # just $__this. $ is meta character and must be escaped
\( # open paren also must be escaped
([\'"]) # open quote (capture for later use). \' is needed in string
(\[ # start capture. open bracket must also be escaped
.*? # Ungreedily capture whatever is between the quotes
\]) # close the open bracket and end capture
\1 # close the quote (captured earlier)
\) # close the parentheses
/xi' # ignore whitespace in pattern, allow comments, case insensitive
, $document, $matches);
The captured text will be in $matches[2]. This assumes one possible capture per line. If you need more, use preg_match_all.

how about:
preg_match('/\$__this(?:(\'|")\((.+?)\)\1)/', $string);
explanation:
(?-imsx:\$__this(?:(\'|")\((.+?)\)\1))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
\$ '$'
----------------------------------------------------------------------
__this '__this'
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
\' '''
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
\( '('
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
.+? any character except \n (1 or more
times (matching the least amount
possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
\) ')'
----------------------------------------------------------------------
\1 what was matched by capture \1
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

Here's a solution that will catch strings with quotes and apostrophes in them as well.
$txt = "
blah blah blah
blah \$_this('abc') blah
blah \$_this('a\"b\"c') blah balah \$_this('a\"b\"c\'')
\$_this(\"123\");\$_this(\"1'23\") \$_this(\"1'23\\\"\")
";
$matches = array();
preg_match_all('/(?:\$_this\()(?:[\'"])(.*?[^\\\])(?:[\'"])(?:\))/im', $txt, $matches);
print_r($matches[1]);

Related

Can a regex search inside an already matched pattern?

Suppose there is a WordPress shortcode content like following-
Some content here
[shortcode_1 attr1="val1" attr2="val2"]
[shortcode_2 attr3="val3" attr4="val4"]
Some text
[/shortcode_2]
[/shortcode_1]
Some more content here
My question is suppose I match the shortcode pattern such that I get the output [shortcode_1]....[/shortcode_1]. But can I get the [shortcode_2]...[/shortcode_2] using the same regex pattern in the same run or do I have to run it again using the output from the first run ?

Description
You could just create a couple of capture groups. One for the entire match, and the second for the subordinate match. Of course this approach does have it's limitations and can get hung up on some pretty complex edge cases.
(\[shortcode_1\s[^\]]*].*?(\[shortcode_2\s.*?\[\/shortcode_2\]).*?\[\/shortcode_1\])
Examples
Live Demo
https://regex101.com/r/bQ0vV2/1
Sample Text
[shortcode_1 attr1="val1" attr2="val2"]
[shortcode_2 attr3="val3" attr4="val4"]
Some text
[/shortcode_2]
[/shortcode_1]
Sample Matches
Capture group 1 gets the shortcode_1
Capture group 2 gets the shortcode_2
1. [0-139] `[shortcode_1 attr1="val1" attr2="val2"]
[shortcode_2 attr3="val3" attr4="val4"]
Some text
[/shortcode_2]
[/shortcode_1]`
2. [45-123] `[shortcode_2 attr3="val3" attr4="val4"]
Some text
[/shortcode_2]`
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
\[ '['
----------------------------------------------------------------------
shortcode_1 'shortcode_1'
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
[^\]]* any character except: '\]' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
] ']'
----------------------------------------------------------------------
.*? any character (0 or more times (matching
the least amount possible))
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
\[ '['
----------------------------------------------------------------------
shortcode_2 'shortcode_2'
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
.*? any character (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
\[ '['
----------------------------------------------------------------------
\/ '/'
----------------------------------------------------------------------
shortcode_2 'shortcode_2'
----------------------------------------------------------------------
\] ']'
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
.*? any character (0 or more times (matching
the least amount possible))
----------------------------------------------------------------------
\[ '['
----------------------------------------------------------------------
\/ '/'
----------------------------------------------------------------------
shortcode_1 'shortcode_1'
----------------------------------------------------------------------
\] ']'
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------

preg_match_all for a string

I'm look to just get America/Chicago from this string
Local Time Zone (America/Chicago (CDT) offset -18000 (Daylight))
and have it work for other TimeZones, like America/Los_Angeles,America/New_York and so on. Im not very good with prey_match_all and also, if someone can direct me to a good tutorial on how to properly learn that,because this is the 3rd time ive needed to use it.

here is your solution with my regular expression
Code
$in = "Local Time Zone (America/Chicago (CDT) offset -18000 (Daylight))";
preg_match_all('/\(([A-Za-z0-9_\W]+?)\\s/', $in, $out);
echo "<pre>";
print_r($out[1][0]);
?>
And OUTPUT
America/Chicago
hope this will sure help you.

I'd use the regex: \((\S+)
preg_match_all('/\((\S+)/', $in, $out);
explanation:
The regular expression:
(?-imsx:\((\S+))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
\( '('
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
\S+ non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times (matching the
most amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

Regex to match uri after mod_rewrite

I'm looking for a PHP PCRE regex to match uri's that are rewritted with Apache's mod_rewrite module. The uri's are as follow :
/param1/param2/param3/param4
The rules for the uri
must contain at least one /;
the params must only allow letters, numbers, - and _;
there must be zero or more instances of the first two rules;

/\/[a-zA-Z0-9_\-\/]+$/
I am assuming that it must start with an / and something like this should not match /param1/param2/param3/param4*

How about:
if (preg_match("~^(?:/[\w-]+)+/?$~", $string)) {
# do stuff
}
Explanation:
The regular expression:
(?-imsx:^(?:/[\w-]+)+/?$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
(?: group, but do not capture (1 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
/ '/'
----------------------------------------------------------------------
[\w-]+ any character of: word characters (a-z,
A-Z, 0-9, _), '-' (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
)+ end of grouping
----------------------------------------------------------------------
/? '/' (optional (matching the most amount
possible))
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

How do you create a string to match an regex?

I need to create a formatting documentation. I know the regex that are used to format the text but I don't know how to reproduce an example for that regex.
This one should be an internal link:
'{\[((?:\#|/)[^ ]*) ([^]]*)\]}'
Can anyone create an example that would match this, and maybe explain how he got it. I got stuck at '?'.
I never used this meta-character at the beginning, usually I use it to mark that an literal cannot appear or appear exactly once.
Thanks

(?:...) has the same grouping effect as (...), but without "capturing" the contents of the group; see http://php.net/manual/en/regexp.reference.subpatterns.php.
So, (?:\#|/) means "either # or /".
I'm guessing you know that [^ ]* means "zero or more characters that aren't SP", and that [^]]* means "zero or more characters that aren't right-square-brackets".
Putting it together, one possible string is this:
'{[/abcd asdfasefasdc]}'

See Open source RegexBuddy alternatives and Online regex testing for some helpful tools. It's easiest to have a regex explained by them first. I used YAPE here:
NODE EXPLANATION
----------------------------------------------------------------------
\[ '['
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
\# '#'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
/ '/'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
[^ ]* any character except: ' ' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
' '
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
[^]]* any character except: ']' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
\] ']'
----------------------------------------------------------------------
This is under the presumption that { and } in your example are the regex delimiters.
You can just read through the list of explanations and come up with a possible source string such as:
[#NOSPACE NOBRACKET]

I think this is a good post to help design regex. While its fairly easy to write a
general regex to match a string, sometimes its helpfull to look at it in reverse after
its designed. Sometimes it is necessary to see what bizzar things will match.
When mixing a lot of the metachars as literals, its fairly important to format
these kind for ease of reading and to avoid errors.
Here are some samples in Perl which were easier (for me) to prototype.
my #samps = (
'{[/abcd asdfasefasdc]}',
'{[# ]}',
'{[# /# \/]}',
'{[/# {[
| /# {[#\/} ]}',
,
);
for (#samps) {
if (m~{\[([#/][^ ]*) ([^]]*)\]}~)
{
print "Found: '$&'\ngrp1 = '$1'\ngrp2 = '$2'\n===========\n\n";
}
}
__END__
Expanded
\{\[
(
[#/][^ ]*
)
[ ]
(
[^\]]*
)
\]\}
Output
Found: '{[/abcd asdfasefasdc]}'
grp1 = '/abcd'
grp2 = 'asdfasefasdc'
===========
Found: '{[# ]}'
grp1 = '#'
grp2 = ''
===========
Found: '{[# /# \/]}'
grp1 = '#'
grp2 = '/# \/'
===========
Found: '{[/# {[
| /# {[#\/} ]}'
grp1 = '/# {[
|'
grp2 = '/# {[#\/} '
===========

How does this regex divide text into sentences?

I know this regex divides a text into sentences. Can someone help me understand how?
/(?<!\..)([\?\!\.])\s(?!.\.)/

You can use YAPE::Regex::Explain to decipher Perl regular expressions:
use strict;
use warnings;
use YAPE::Regex::Explain;
my $re = qr/(?<!\..)([\?\!\.])\s(?!.\.)/;
print YAPE::Regex::Explain->new($re)->explain();
__END__
The regular expression:
(?-imsx:(?<!\..)([\?\!\.])\s(?!.\.))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
(?<! look behind to see if there is not:
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
. any character except \n
----------------------------------------------------------------------
) end of look-behind
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[\?\!\.] any character of: '\?', '\!', '\.'
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
. any character except \n
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

There is the Regular Expression Analyzer which will do quite the same as toolic already suggested - but completely webbased.

(? # Find a group (don't capture)
< # before the following regular expression
! # that does not match
\. # a literal "."
. # followed by 1 character
) # (End look-behind group)
( # Start a group (capture it to $1)
[\?\!\.] # Containing any one of the characters in the following set "?!."
) # End group $1
\s # followed by a whitespace character " ", \t, etc.
(? # Followed by a group (don't capture)
# after the preceding regular expression
! # that does not have
. # 1 character
\. # followed by a literal "."
) # (End look-ahead group)

The first part (?<!\..) is a negative look-behind. It specifies a pattern which invalidates the match. In this case it's looking for two characters--the first a period and the other one any character.
The second part is a standard capture/group, which could be better expressed: ([?!.]) (you don't need the escapes in the class brackets), that is a sentence ending punctuation character.
The next part is a single (??) white-space character: \s
And the last part is a negative look-ahead: (?!.\.). Again it is guarding against the case of a single character followed by a period.
This should work, relatively well. But I don't think I would recommend it. I don't see what the coder was getting at trying to make sure that just a period wasn't the second most recent character, or that it wasn't the second one to come.
I mean if you are looking to split on terminal punctuation, why don't you want to guard against the same class being two-back or two-ahead? Instead it relies on periods not being there. Thus a more regular expression would be:
/(?<![?!.].)([?!.])\s(?!.[?!.])/

Portions:
([\?\!\.])\s: split by ending character (.,!,or ?) which is followed by a whitespace character (space, tab, newline)
(?<!\..) where the characters before this 'ending character' arent a .+anything
(?!.\.) after the whitespace character any character directly followed by any . isn't allowed.
Those look-ahead ((?!) & look-behind ((?<!) assertions mainly seem to prevent splitting on (whitespaced?) abbreviations (q. e. d. etc.).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regex to find information in PHP function - php

Related

Can a regex search inside an already matched pattern?

preg_match_all for a string

Regex to match uri after mod_rewrite

How do you create a string to match an regex?

How does this regex divide text into sentences?

Categories

Resources