regex may or maynot contain a character - php

I have some regex I am using in PHP:
preg_match_all('/D3m57D3m58(.+)D3m59/Uis', $content, $m)
This works fine for most of the stuff, but in some examples D3m57 and D3m58 may be separated by a new line, thus no match.
How can I get this to match even if there is a new line in between them, but it still match even if there is not...
i cant alter the string it is matching against

Add an optional newline in between them using ? quantifier, which matches either 0 or 1:
preg_match_all('/D3m57\n?D3m58(.+)D3m59/Uis', $content, $m)

Related

Trying to get a regexp to find all hyperlinks with arguments that dont contain a word

I'm using the following regexp with PHP's preg_replace:
$f[] = '/href\=\"([a-zA-Z\_]*?).php\?(.*?)\"/';
I want to update this to match all hyperlinks ending in .php with arguments (as it is now), but exclude any links that have the word "phpinfo" in the link..
I've tried:
$f[]='/href\=\"([a-zA-Z\_]*?).php\?(.*?!phpinfo)\"/';
But I fear I am doing it all wrong, it is not working - I have not been able to find a similar example that I am able to adapt to get this working.
Use a negative lookahead based regex.
$f[] = '/\bhref="([a-zA-Z\_]*?).php\?((?:(?!phpinfo|").)*)"/';
DEMO
The trickier part is this (?:(?!phpinfo|").)* which matches any character but not of double quotes or phpinfo, zero or more times. What I mean by "not of phpinfo" is, the following character would be any but not the starting letter in substring phpinfo ie, p. So this would match p only if the following chars must not be hpinfo.

RegExp - Finding words not starting with a certain char

I'm trying to extract words (read: functions) from a string with RegExp and pass them to a PHP function.
The following works pretty well already:
$func = preg_replace("/(\b.+\b)/Ue", 'extract_functions(\'\\1\')', $oneliner);
While it extracts existing functions from the string it also extracts variables with the same name, but without the starting $ char.
So if the string contains an existing function named get_function it also extracts a variable named $get_function but without the starting $, so I can't be sure whether I have a function or variable extracted.
My idea was to exclude words starting with $ but that doesn't seem to work:
$func = preg_replace("/[(\b[^\$].+\b)/Ue", 'extract_functions(\'\\1\')', $oneliner);
I'm out of ideas...
You can use a negative lookbehind to make sure that there's no $ preceding your function/variable:
$func = preg_replace("/(?<!\$)(\b.+\b)/Ue", 'extract_functions(\'\\1\')', $line);
By the way [(\b[^\$] is a bit wrongly formed. You have a character class containing (, \b, ^ and $, which doesn't work. It will actually match any of those characters instead of not matching a $ character.
It would have been a little closer with /[^$](\b.+\b)/ but this one might not work at the beginning of strings.
Thanks to Jeff, here's the solution that works for me:
$filecontent = file_get_contents($file); // Parsing the file's contents into a string
$re = '/(?<!\$)(\b\S+?\b)(?=\()/'; // The pattern
preg_match_all($re, $filecontent, $out, PREG_PATTERN_ORDER);
print_r($out[0]);
I'm using a negative lookbehind as suggested by Jeff as well as a positive lookahead checking for a ( after each word, but without making the ( part of the match.
I went for the ( part as that defines a PHP function as far as I'm concerned.
I'm open for improvements! :-) Thanks to Jeff!

Regex pattern to match any character except the last one

I am trying to match a string using two different patterns to work together.
My source string is something like this:
Text, white-spaces, new lines and more text then ^^^^<customtag>
I need to get a group (the second one) that would capture one caret or none then a formatted HTML-like tag. So the first group would capture anything else.
It means that the string above should output this:
(Group 1)Text, white-spaces, new lines and more text then ^^^
(Group 2)^<customtag>
In the source string carets may be one, none or up to two thousands.
I need a good pattern that matches all those carets except the last one.
The code below is what I tried.
preg_match_all('/([\s\S]*\^*)(\^?<\w+>)$/', $string, $matches);
Please note: I used [\s\S] instead of the dot to match any character as well as white-spaces and new lines too.
You may follow the below regex:
(?s)(.*)((\^|(?<!\^))<[^>]+>)
Live demo
PHP code:
preg_match_all('/(?s)(.*)((\^|(?<!\^))<[^>]+>)/', $string, $matches);
You can use as this:
preg_match_all('/(.*)((\^<[^>]*>)|([^\^]<[^>]*>))$/', $string, $matches);
See it working here: http://regexr.com?383g9
In this other link it is working fine: http://regex101.com/r/eQ3vV7

php PCRE regex to get only the file name that terminates in .txt

so I am trying to form a PCRE regex in php, specifically for use with preg_replace, that will match any number of characters that make up a text(.txt) file name, from this I will derive the directory of the file.
my initial approach was to define the terminating .txt string, then attempt to specify a character match on every character except for the / or \, so I ended up with something like:
'/[^\\\\/]*\.txt$/'
but this didn't seem to work at all, I assume it might be interpreting the negation as the demorgan's form aka:
(A+B)' <=> A'B'
but after attempting this test:
'/[^\\\\]\|[^/]*\.txt$/'
I came to the same result, which made me think that I shouldn't escape the or operator(|), but this also failed to match. Anyone know what I'm doing wrong?
The foloowing regular expression should work for getting the filename of .txt files:
$regex = "#.*[\\\\/](.*?\.txt)$#";
How it works:
.* is greedy and thus forces match to be as far to the right as possible.
[\\\\/] ensures that we have a \ or / in front of the filename.
(.*?\.txt) uses non-greedy matching to ensure that the filename is as small as possible, followed by .txt, capturing it into group 1.
$ forces match to be at end of string.
Try this pattern '/\b(?P<files>[\w-.]+\.txt)\b/mi'
$PATTERN = '/\b(?P<files>[\w-.]+\.txt)\b/mi';
$subject = 'foo.bar.txt plop foo.bar.txtbaz foo.txt';
preg_match_all($PATTERN, $subject, $matches);
var_dump($matches["files"]);

Regular expression problems

I can't get the preg_match to find a word anywhere a the string.
I have this:
$bad_words = "/(\bsuck\b)|(\bsucks\b)|(\bporn\b)|";
$text = "sucky";
if(preg_match($bad_term_filter, trim($feedback_review_comment)) != 0 )
I need to return true but it only returns true if its an exact match, for example if
$text = "suck";
that returns true
\b is the word boundary anchor. It looks like you're trying to find if some word occurs anywhere regardless of the word boundaries, so I think the pattern you want is simply:
suck|porn
You also do not want the last empty alternate, because that will match everything (all string contains an empty string). There is no need to explicitly look for sucks, because it already contains suck.
References
regular-expressions.info/Anchors and Character Classes, and Optional

Categories