Regex in PHP (preg_match) pattern - php

I created a regex pattern that works in Dreamweaver's regex Find, but when dropped into the pattern of preg_match, it fails. What am I breaking in the PHP (5.1.6) regex rules that otherwise works in Dreamweaver's interpretation? Here's the PHP:
preg_match("/(\{a\})([a-zA-Z0-9{} .])+(\{/a\})/i", "{a}{900678}{abcde}{0}{0}{0}{/a}");
Returns false currently. How can I modify the pattern so that it matches any string that begins with {a}anything goes in the middle{/a} type strings? I realize that the above regex will not match 'anything' in the middle, but I simplified the expression for debugging.

The slash in the /a part is being interpreted as the end delimiter of the expression. You should probably use another delimiter for the whole pattern, e.g.:
preg_match("~(\{a\})([a-zA-Z0-9{} .])+(\{/a\})~i",
"{a}{900678}{abcde}{0}{0}{0}{/a}");
See it in action.

Related

(PHP) How to find words beginning with a pattern and replace all of them?

I have a string. An example might be "Contact /u/someone on reddit, or visit /r/subreddit or /r/subreddit2"
I want to replace any instance of "/r/x" and "/u/x" with "[/r/x](http://reddit.com/r/x)" and "[/u/x](http://reddit.com/u/x)" basically.
So I'm not sure how to 1) find "/r/" and then expand that to the rest of the word (until there's a space), then 2) take that full "/r/x" and replace with my pattern, and most importantly 3) do this for all "/r/" and "/u/" matches in a single go...
The only way I know to do this would be to write a function to walk the string, character by character, until I found "/", then look for "r" and "/" to follow; then keep going until I found a space. That would give me the beginning and ending characters, so I could do a string replacement; then calculate the new end point, and continue walking the string.
This feels... dumb. I have a feeling there's a relatively simple way to do this, and I just don't know how to google to get all the relevant parts.
A simple preg_replace will do what you want.
Try:
$string = preg_replace('#(/(?:u|r)/[a-zA-Z0-9_-]+)#', '[\1](http://reddit.com\1)', $string);
Here is an example: http://ideone.com/dvz2zB
You should see if you can discover what characters are valid in a Reddit name or in a Reddit username and modify the [a-zA-Z0-9_-] charset accordingly.
You are looking for a regular expression.
A basic pattern starts out as a fixed string. /u/ or /r/ which would match those exactly. This can be simplified to match one or another with /(?:u|r)/ which would match the same as those two patterns. Next you would want to match everything from that point up to a space. You would use a negative character group [^ ] which will match any character that is not a space, and apply a modifier, *, to match as many characters as possible that match that group. /(?:u|r)/[^ ]*
You can take that pattern further and add a lookbehind, (?<= ) to ensure your match is preceded by a space so you're not matching a partial which results in (?<= )/(?:u|r)/[^ ]*. You wrap all of that to make a capturing group ((?<= )/(?:u|r)/[^ ]*). This will capture the contents within the parenthesis to allow for a replacement pattern. You can express your chosen replacement using the \1 reference to the first captured group as [\1](http://reddit.com\1).
In php you would pass the matching pattern, replacement pattern, and subject string to the preg_replace function.
In my opinion regex would be an overkill for such a simple operation. If you just want to replace instance of "/r/x" with "[r/x](http://reddit.com/r/x)" and "/u/x" with "[/u/x](http://reddit.com/u/x)" you should use str_replace although with preg_replace it'll lessen the code.
str_replace("/r/x","[/r/x](http://reddit.com/r/x)","whatever_string");
use regex for intricate search string and replace. you can also use http://www.jslab.dk/tools.regex.php regular expression generator if you have something complex to capture in the string.

PHP preg_replace Regex Issue

I am fairly new to regex, but I've written a match string. I think it's pretty close, but it isn't working. I need to find URLs that match a certain pattern in a longer string.
Here are a couple of examples of URLs:
http://static.squarespace.com/static/j433gj93943tj9043/23rf9g4390930/4343t49t4/4g93g4390g49u0/image.png
http://static.squarespace.com/static/yy9ii93i9034/g43g34/j6j66767j6gdrdg/g4g34g34h/something.png
Here is my regex:
#^http://static.squarespace.com(a-zA-Z0-9-./)(png|jpg)$#
Both of those URLs should be matched, but they aren't... preg_match is returning ==== FALSE
You need to define (brackets) and allow for repetition (plus) of your character class.
^http://static.squarespace.com([a-zA-Z0-9-./]+)(png|jpg)$

simple regex to validate url returns always false with preg_match

I know there are a lot topics out there which show Regular Expressions to validate URL's. Also there is a FILTER_VALIDATE_URL function out there, i do know that too.
I'd like to know whats wrong with my regular expression to understand whats wrong with it.
My RegularExpression should match URL's with http:// or https:// in front of it. After that it can be any character, one or more. It should end with a dot and after that a string with 2 to 5 characters a-z.
$s = preg_match('^(http|https)://.+(\.[a-z]{2,5})$', $url);
I tried this RegularExpression on http://regexpal.com/. It matches correctly, but my preg_match call gives me always false. Can anyone explain to me whats incorrect about this RegularExpression?
Thank You Very Much
In PHP, you're required to use delimiters in your regular expression syntax. A delimiter can be any non-alphanumeric, non-backslash, non-whitespace character. Most people use / as a delimiter, but since this appears in your URL you can use another character, such as # to avoid escaping:
'#^(http|https)://.+(\.[a-z]{2,5})$#'
Side note: (http|https) will capture as it is wrapped in parenthesis. You don't really need this, but it's also simpler to just write https?, where the ? makes the s an optional character in the expression.

recursive regular expression to process nested strings enclosed by {| and |}

In a project I have a text with patterns like that:
{| text {| text |} text |}
more text
I want to get the first part with brackets. For this I use preg_match recursively. The following code works fine already:
preg_match('/\{((?>[^\{\}]+)|(?R))*\}/x',$text,$matches);
But if I add the symbol "|", I got an empty result and I don't know why:
preg_match('/\{\|((?>[^\{\}]+)|(?R))*\|\}/x',$text,$matches);
I can't use the first solution because in the text something like { text } can also exist. Can somebody tell me what I do wrong here? Thx
Try this:
'/(?s)\{\|(?:(?:(?!\{\||\|\}).)++|(?R))*\|\}/'
In your original regex you use the character class [^{}] to match anything except a delimiter. That's fine when the delimiters are only one character, but yours are two characters. To not-match a multi-character sequence you need something this:
(?:(?!\{\||\|\}).)++
The dot matches any character (including newlines, thank to the (?s)), but only after the lookahead has determined that it's not part of a {| or |} sequence. I also dropped your atomic group ((?>...)) and replaced it with a possessive quantifier (++) to reduce clutter. But you should definitely use one or the other in that part of the regex to prevent catastrophic backtracking.
You've got a few suggestions for working regular expressions, but if you're wondering why your original regexp failed, read on. The problem lies when it comes time to match a closing "|}" tag. The (?>[^{}]+) (or [^{}]++) sub expression will match the "|", causing the |} sub expression to fail. With no backtracking in the sub expression, there's no way to recover from the failed match.
See PHP - help with my REGEX-based recursive function
To adapt it to your use
preg_match_all('/\{\|(?:^(\{\||\|\})|(?R))*\|\}/', $text, $matches);

Can you rely on the order that regular expression syntax is interpreted?

(The background for this question is that I thought it would be fun to write something that parses wiki creole markup. Anyway the problem that I think I have a solution to is differentiating between // in a url and as opening/closing syntax for italic text)
My question is slightly compound so I've tried to break it up under the headings
If there is a substring(S1) that can contain any one of a series of substrings separated by | does the regular expression interpreter simply match the first substring within 'S1' then move onto the regular expression after 'S1'? Or can will it in some instances try find the best/greediest match?
Here is an example to try and make my question more clear:
String to search within: String
Regex: /(?:(Str|Strin).*)/ (the 'S1' in my question refers to the non-capturing substring
I think that the matches from the above should be:
$0 will be String
$1 will be Str and not Strin
Will this always happen or are the instances (e.g maybe 'S1' being match greedily using *) where the another matching substring will be used i.e. Strin in my example.
If the above is correct than can I/should I rely on this behaviour?
Real world example
/^\/\/(\b((https?|ftp):\/\/|mailto:)([^\s~]*?(?:~(.|$))?)+?(?=\/\/|\s|$)|~(.|$)|[^/]|\/([^/]|$))*\/\//
Should correctly match:
//Some text including a http//:url//
With $1 == Some text including a http//:url
Note: I've tried to make this relatively language agnostic but I will be using php
PHP uses the PCRE regex engine. By default, and the way PHP uses it, the PCRE engine runs in longest-leftmost mode. This mode returns the first match, evaluating the regex from left to right. So yes, you can rely on the order that PHP interprets a regex.
The other mode, provided by the pcre_dfa_exec() function, evaluates all possible matches and returns the longest possible match.
In PHP, using preg extension, you can choose between greedy and non greedy operators (usually appending '?' to them).
By the way, in the example you gave, if you want Strin to match, you must invert your cases : /(?:(Strin|Str).*)/. I think, you should put the most generic expression at the end of the Regex.
FYI, with preg engine,
alternation operator is neither greedy nor lazy but ordered
Mastering regular expressions, J. Friedl, p175
If you want a greedy engine, you must use a Posix compliant engine (ereg - but it's deprecated).

Categories