Simple RegEx PHP - php

I have a string and I need to see if it contains the following "_archived".
I was using the following:
preg_match('(.*)_archived$',$string);
but get:
Warning: preg_match() [function.preg-match]: Unknown modifier '_' in /home/storrec/classes/class.main.php on line 70
I am new to Regular Expressions so this is probably very easy.
Or should I be using something a lot simpler like
strstr($string, "_archived");
Thanks in advance

strstr($string, "_archived");
Is going to be way easier for the problem you describe.
As is often quoted
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. - Jamie Zawinski

strstr is enough in this case, but to solve your problem, you need to add delimiters to your regex. A delimiter is a special character that starts and ends the regex, like so:
preg_match('/_archived/',$string);
The delimiter can be a lot of different characters, but usual choices are /, # and !. From the PHP manual:
Any character can be used for delimiter as long as it's not alphanumeric, backslash (), or the null byte. If the delimiter character has to be used in the expression itself, it needs to be escaped by backslash. Since PHP 4.0.4, you can also use Perl-style (), {}, [], and <> matching delimiters.
Read all about PHP regular expression syntax here.
You can see some examples of valid (and invalid) patterns in the PHP manual here.

You just need some delimiters, e.g. enclose the pattern with /
preg_match('/_archived$/',$string);
Perl regexes let you use any delimiter, which is handy if your regex uses / a lot. I often find myself using braces for example:
preg_match('{_archived$}',$string);
Also, note that you don't need the (.*) bit as you aren't capturing the bit before "_archived", you're just testing to see if the string ends with it (that $ symbol on the end matches the end of the string)

If all you're looking for is if a string contains a string, then by all means use the simple version. But you can also simply do:
preg_match('/_archived/', $string);

Try:
preg_match('/(.*)_archived$/',$string);
If you are only checking if the string exists in $string, strstr should be enough for you though.

strstr() or strpos() would be better for finding something like this, I would think. The error you're getting is due to not having any delimiters around your regular expression. Try using
"/(.*)_archived$/"
or
"#(.*)_archived$#".

... is probably the best way to implement this. You're right that a regex is overkill.

When you have a specific string you're looking for, using regular expressions in a bit of overkill. You'll be fine using one of PHP's standard string search functions here.
The strstr function will work, but conventional PHP Wisdom (read: myth, legend, superstition, the manual) says that using the strpos function will yield better performance. i.e., something like
if(strpos($string, '_archived') !== false) {
}
You'll want to use the true inequality operator here, as strpos returns "0" when it finds a needle at the start of it's haystack. See the manual for more information on this.
As to your problem, PHP's regular expression engine expects you to enclose your regular expression string with a set of delimiters, which can be one of a number of different characters ( "/" or "|" or "(" or "{" or "#", or ...). The PHP engine thinks you want a regular expression of
.*
with a set of pattern modifiers that are
_archived$
So, in the future, when you use a regular expression, try something like
//equivilant
preg_match('/(.*)_archived$/i',$string);
preg_match('{(.*)_archived$}i',$string);
preg_match('#(.*)_archived$#i',$string);
The "/", "#", and "{}" characters are your delimiters. They are not used in the match, they're used to tell the engine "anything in-between these characters is my reg ex. The "i" at the end is a pattern modifiers that says to be case insensitive. It's not necessary, but I include it here so you can see what a pattern modifier looks like, and to help you understand why you need delimiters in the first place.

You don't have to remember about delimiters if you are using T-Regx
pattern('_archived$')->matches($string)

Related

simple regex to validate url returns always false with preg_match

I know there are a lot topics out there which show Regular Expressions to validate URL's. Also there is a FILTER_VALIDATE_URL function out there, i do know that too.
I'd like to know whats wrong with my regular expression to understand whats wrong with it.
My RegularExpression should match URL's with http:// or https:// in front of it. After that it can be any character, one or more. It should end with a dot and after that a string with 2 to 5 characters a-z.
$s = preg_match('^(http|https)://.+(\.[a-z]{2,5})$', $url);
I tried this RegularExpression on http://regexpal.com/. It matches correctly, but my preg_match call gives me always false. Can anyone explain to me whats incorrect about this RegularExpression?
Thank You Very Much
In PHP, you're required to use delimiters in your regular expression syntax. A delimiter can be any non-alphanumeric, non-backslash, non-whitespace character. Most people use / as a delimiter, but since this appears in your URL you can use another character, such as # to avoid escaping:
'#^(http|https)://.+(\.[a-z]{2,5})$#'
Side note: (http|https) will capture as it is wrapped in parenthesis. You don't really need this, but it's also simpler to just write https?, where the ? makes the s an optional character in the expression.

Regular expression with exclamation marks on both sides ('!\d!')

I've seen the regular expression '!\d!' inside the PHP preg_match function. What the heck is this?
From the PHP PCRE docs:
When using the PCRE functions, it is required that the pattern is enclosed by delimiters. A delimiter can be any non-alphanumeric, non-backslash, non-whitespace character.
In this case, it's simply using ! as the delimiter. Often it's used if you want to use the normal delimiter within the regex itself without having to escape it. Not really necessary in this case since the rest of the regex is simply \d, but it comes in handy for things like checking that a path contains more than three directory levels. You can use either of:
/\/.*\/.*\/.*\/ blah blah blah /
or:
!/.*/.*/.*/ blah blah blah !
Now they haven't been tested thoroughly, and may not work entirely as advertised, but you should get the general idea re the minimal escaping required.
Another example (from the page linked to above) is checking if a string starts with the http:// marker. Either of these two:
/^http:\/\//
!^http://!
would suffice, but the second is easier to understand.
! is used as delimiter, \d matches the single digit.
It is the same as /[0-9]/

recursive regular expression to process nested strings enclosed by {| and |}

In a project I have a text with patterns like that:
{| text {| text |} text |}
more text
I want to get the first part with brackets. For this I use preg_match recursively. The following code works fine already:
preg_match('/\{((?>[^\{\}]+)|(?R))*\}/x',$text,$matches);
But if I add the symbol "|", I got an empty result and I don't know why:
preg_match('/\{\|((?>[^\{\}]+)|(?R))*\|\}/x',$text,$matches);
I can't use the first solution because in the text something like { text } can also exist. Can somebody tell me what I do wrong here? Thx
Try this:
'/(?s)\{\|(?:(?:(?!\{\||\|\}).)++|(?R))*\|\}/'
In your original regex you use the character class [^{}] to match anything except a delimiter. That's fine when the delimiters are only one character, but yours are two characters. To not-match a multi-character sequence you need something this:
(?:(?!\{\||\|\}).)++
The dot matches any character (including newlines, thank to the (?s)), but only after the lookahead has determined that it's not part of a {| or |} sequence. I also dropped your atomic group ((?>...)) and replaced it with a possessive quantifier (++) to reduce clutter. But you should definitely use one or the other in that part of the regex to prevent catastrophic backtracking.
You've got a few suggestions for working regular expressions, but if you're wondering why your original regexp failed, read on. The problem lies when it comes time to match a closing "|}" tag. The (?>[^{}]+) (or [^{}]++) sub expression will match the "|", causing the |} sub expression to fail. With no backtracking in the sub expression, there's no way to recover from the failed match.
See PHP - help with my REGEX-based recursive function
To adapt it to your use
preg_match_all('/\{\|(?:^(\{\||\|\})|(?R))*\|\}/', $text, $matches);

Can you rely on the order that regular expression syntax is interpreted?

(The background for this question is that I thought it would be fun to write something that parses wiki creole markup. Anyway the problem that I think I have a solution to is differentiating between // in a url and as opening/closing syntax for italic text)
My question is slightly compound so I've tried to break it up under the headings
If there is a substring(S1) that can contain any one of a series of substrings separated by | does the regular expression interpreter simply match the first substring within 'S1' then move onto the regular expression after 'S1'? Or can will it in some instances try find the best/greediest match?
Here is an example to try and make my question more clear:
String to search within: String
Regex: /(?:(Str|Strin).*)/ (the 'S1' in my question refers to the non-capturing substring
I think that the matches from the above should be:
$0 will be String
$1 will be Str and not Strin
Will this always happen or are the instances (e.g maybe 'S1' being match greedily using *) where the another matching substring will be used i.e. Strin in my example.
If the above is correct than can I/should I rely on this behaviour?
Real world example
/^\/\/(\b((https?|ftp):\/\/|mailto:)([^\s~]*?(?:~(.|$))?)+?(?=\/\/|\s|$)|~(.|$)|[^/]|\/([^/]|$))*\/\//
Should correctly match:
//Some text including a http//:url//
With $1 == Some text including a http//:url
Note: I've tried to make this relatively language agnostic but I will be using php
PHP uses the PCRE regex engine. By default, and the way PHP uses it, the PCRE engine runs in longest-leftmost mode. This mode returns the first match, evaluating the regex from left to right. So yes, you can rely on the order that PHP interprets a regex.
The other mode, provided by the pcre_dfa_exec() function, evaluates all possible matches and returns the longest possible match.
In PHP, using preg extension, you can choose between greedy and non greedy operators (usually appending '?' to them).
By the way, in the example you gave, if you want Strin to match, you must invert your cases : /(?:(Strin|Str).*)/. I think, you should put the most generic expression at the end of the Regex.
FYI, with preg engine,
alternation operator is neither greedy nor lazy but ordered
Mastering regular expressions, J. Friedl, p175
If you want a greedy engine, you must use a Posix compliant engine (ereg - but it's deprecated).

Simple preg_replace

I cant figure out preg_replace at all, it just looks chinese to me, anyway I just need to remove "&page-X" from a string if its there.
X being a number of course, if anyone has a link to a useful preg_replace tutorial for beginners that would also be handy!
Actually the basic syntax for regular expressions, as supported by preg_replace and friends, is pretty easy to learn. Think of it as a string describing a pattern with certain characters having special meaning.
In your very simple case, a possible pattern is:
&page-\d+
With \d meaning a digit (numeric characters 0-9) and + meaning: Repeat the expression right before + (here: \d) one or more times. All other characters just represent themselves.
Therefore, the pattern above matches any of the following strings:
&page-0
&page-665
&page-1234567890
Since the preg functions use a Perl-compatible syntax and regular expressions are denoted between slashes (/) in Perl, you have to surround the pattern in slashes:
$after = preg_replace('/&page-\d+/', '', $before);
Actually, you can use other characters as well:
$after = preg_replace('#&page-\d+#', '', $before);
For a full reference of supported syntax, see the PHP manual.
preg_replace uses Perl-Compatible Regular Expression for the search pattern. Try this pattern:
preg_replace('/&page-\d+/', '', $str)
See the pattern syntax for more information.
$outputstring = preg_replace('/&page-\d+/', "", $inputstring);
preg_replace()
preg_replace('/&page-\d+/', '', $string)
Useful information:
Using Regular Expressions with PHP
http://articles.sitepoint.com/article/regular-expressions-php

Categories