PHP preg_replace Regex Issue - php

I am fairly new to regex, but I've written a match string. I think it's pretty close, but it isn't working. I need to find URLs that match a certain pattern in a longer string.
Here are a couple of examples of URLs:
http://static.squarespace.com/static/j433gj93943tj9043/23rf9g4390930/4343t49t4/4g93g4390g49u0/image.png
http://static.squarespace.com/static/yy9ii93i9034/g43g34/j6j66767j6gdrdg/g4g34g34h/something.png
Here is my regex:
#^http://static.squarespace.com(a-zA-Z0-9-./)(png|jpg)$#
Both of those URLs should be matched, but they aren't... preg_match is returning ==== FALSE

You need to define (brackets) and allow for repetition (plus) of your character class.
^http://static.squarespace.com([a-zA-Z0-9-./]+)(png|jpg)$

Related

Match string not followed by another string

I'm trying to match a string. For example, in the following strings, first should match, the second should not.
/users/akinuri/
/users/akinuri/asd/
I've tried a negative lookahead, but it didn't work. Obviously I'm doing something wrong.
if (preg_match("/\/users\/.*?\/(?!.*\/)/", "/users/akinuri/asd/")) {
echo "match";
}
I'm experimenting (trying to create a route system). Right now, I'm just trying to determine if the requested uri is valid. If a visitor requests the second string, it should return false and I'll return a not found page. For example, test this on SO. The second url returns a not found page.
https://stackoverflow.com/users/2202732/akinuri/
https://stackoverflow.com/users/2202732/akinuri/asdasd
So, can I do this check using regex? If so, how? What am I doing wrong?
Or should I split the text and then do further checks? This seems a bit redundant.
You don't need a lookahead for this. You can use a negated character class [^/]+ to match 1+ of any character except /. You also need to use anchors in regex to make sure you match complete input.
For PHP code, you can use this regex:
'~^/users/[^/]+/?$~'
Note that /?$ makes trailing slash an optional match.
RegEx Demo

(PHP) How to find words beginning with a pattern and replace all of them?

I have a string. An example might be "Contact /u/someone on reddit, or visit /r/subreddit or /r/subreddit2"
I want to replace any instance of "/r/x" and "/u/x" with "[/r/x](http://reddit.com/r/x)" and "[/u/x](http://reddit.com/u/x)" basically.
So I'm not sure how to 1) find "/r/" and then expand that to the rest of the word (until there's a space), then 2) take that full "/r/x" and replace with my pattern, and most importantly 3) do this for all "/r/" and "/u/" matches in a single go...
The only way I know to do this would be to write a function to walk the string, character by character, until I found "/", then look for "r" and "/" to follow; then keep going until I found a space. That would give me the beginning and ending characters, so I could do a string replacement; then calculate the new end point, and continue walking the string.
This feels... dumb. I have a feeling there's a relatively simple way to do this, and I just don't know how to google to get all the relevant parts.
A simple preg_replace will do what you want.
Try:
$string = preg_replace('#(/(?:u|r)/[a-zA-Z0-9_-]+)#', '[\1](http://reddit.com\1)', $string);
Here is an example: http://ideone.com/dvz2zB
You should see if you can discover what characters are valid in a Reddit name or in a Reddit username and modify the [a-zA-Z0-9_-] charset accordingly.
You are looking for a regular expression.
A basic pattern starts out as a fixed string. /u/ or /r/ which would match those exactly. This can be simplified to match one or another with /(?:u|r)/ which would match the same as those two patterns. Next you would want to match everything from that point up to a space. You would use a negative character group [^ ] which will match any character that is not a space, and apply a modifier, *, to match as many characters as possible that match that group. /(?:u|r)/[^ ]*
You can take that pattern further and add a lookbehind, (?<= ) to ensure your match is preceded by a space so you're not matching a partial which results in (?<= )/(?:u|r)/[^ ]*. You wrap all of that to make a capturing group ((?<= )/(?:u|r)/[^ ]*). This will capture the contents within the parenthesis to allow for a replacement pattern. You can express your chosen replacement using the \1 reference to the first captured group as [\1](http://reddit.com\1).
In php you would pass the matching pattern, replacement pattern, and subject string to the preg_replace function.
In my opinion regex would be an overkill for such a simple operation. If you just want to replace instance of "/r/x" with "[r/x](http://reddit.com/r/x)" and "/u/x" with "[/u/x](http://reddit.com/u/x)" you should use str_replace although with preg_replace it'll lessen the code.
str_replace("/r/x","[/r/x](http://reddit.com/r/x)","whatever_string");
use regex for intricate search string and replace. you can also use http://www.jslab.dk/tools.regex.php regular expression generator if you have something complex to capture in the string.

How to exclude a word or string from an URL - Regex

I'm using the following Regex to match all types of URL in PHP (It works very well):
$reg_exUrl = "%\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s";
But now, I want to exclude Youtube, youtu.be and Vimeo URLs:
I'm doing something like this after researching, but it is not working:
$reg_exUrl = "%\b(([\w-]+://?|www[.])(?!youtube|youtu|vimeo)[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s";
I want to do this, because I have another regex that match Youtube urls which returns an iframe and this regex is causing confusion between the two Regex.
Any help would be gratefully appreciated, thanks.
socodLib, to exclude something from a string, place yourself at the beginning of the string by anchoring with a ^ (or use another anchor) and use a negative lookahead to assert that the string doesn't contain a word, like so:
^(?!.*?(?:youtube|some other bad word|some\.string\.with\.dots))
Before we make the regex look too complex by concatenating it with yours, let;s see what we would do if you wanted to match some word characters \w+ but not youtube or google, you would write:
^(?!.*?(?:youtube|google))\w+
As you can see, after the assertion (where we say what we don't want), we say what we do want by using the \w+
In your case, let's add a negative lookahead to your initial regex (which I have not tuned):
$reg_exUrl = "%(?i)\b(?!.*?(?:youtu\.?be|vimeo))(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s";
I took the liberty of making the regex case insensitive with (?i). You could also have added i to your s modifier at the end. The youtu\.?be expression allows for an optional dot.
I am certain you can apply this recipe to your expression and other regexes in the future.
Reference
Regex lookarounds
StackOverflow regex FAQ

PHP regex lookbehind with wildcard

I have two strings in PHP:
$string = '<a href="http://localhost/image1.jpeg" /></a>';
and
$string2 = '[caption id="attachment_5" align="alignnone" width="483"]<a href="http://localhost/image1.jpeg" /></a>[/caption]';
I'm trying to match strings of the first type. That is strings that are not surrounded by '[caption ... ]' and '[/caption]'. So far, I would like to use something like this:
$pattern = '/(?<!\[caption.*\])(?!\[\/caption\])(<a.*><img.*><\/a>)/';
but PHP matches out the first string as well with this pattern even though it is NOT preceeded by '[caption' and zero or more characters followed by ']'. What gives? Why is this and what's the correct pattern?
Thanks.
Variable length look-behind is not supported in PHP, so this part of your pattern is not valid:
(?<!\[caption.*\])
It should be warning you about this.
In addition, .* always matches the larges possible amount. Thus your pattern may result in a match that overlaps multiple tags. Instead, use [^>] (match anything that is not a closing bracket), because closing brackets should not occur inside the img tag.
To solve the look-behind problem, why not just check for the closing tag only? This should be sufficient (assuming the caption tags are only used in a way similar to what you have shown).
$pattern = '|(<a[^>]*><img[^>]*></a>)(?!\[/caption\])|';
When matching patterns that contain /, use another character as the pattern delimiter to avoid leaning toothpick syndrome. You can use nearly any non-alphanumeric character around the pattern.
Update: the previous regex is based on the example regex you gave, rather than the example data. If you want to match links that don't contain images, do this:
$pattern = '|(<a[^>]*>[^<]*</a>)(?!\[/caption\])|';
Note that this doesn't allow any tags in the middle of the link. If you allow tags (such as by using .*?), a regex could match something starting within the [caption] and ending elsewhere.
I don't see how your regexp could match either string, since you're looking for <a.*><img.*><\/a>, and both anchors don't contain an <img... tag. Also, the two subexpressions looking for and prohibiting the caption-bits look oddly positioned to me. Finally, you need to ensure your tag-matching bits don't act greedy, i.e. don't use .* but [^>]*.
Do you mean something like this?
$pattern = '/(<a[^>]*>(<img[^>]*>)?<\/a>)(?!\[\/caption\])/'
Test it on regex101.
Edit: Removed useless lookahead as per dan1111's suggestion and updated regex101 link.
Lookbehind doesn't allow non fixed length pattern i.e. (*,+,?), I think this /<a.*><\/a>(?!\[\/caption\])/ is enough for your requirement

Regex in PHP (preg_match) pattern

I created a regex pattern that works in Dreamweaver's regex Find, but when dropped into the pattern of preg_match, it fails. What am I breaking in the PHP (5.1.6) regex rules that otherwise works in Dreamweaver's interpretation? Here's the PHP:
preg_match("/(\{a\})([a-zA-Z0-9{} .])+(\{/a\})/i", "{a}{900678}{abcde}{0}{0}{0}{/a}");
Returns false currently. How can I modify the pattern so that it matches any string that begins with {a}anything goes in the middle{/a} type strings? I realize that the above regex will not match 'anything' in the middle, but I simplified the expression for debugging.
The slash in the /a part is being interpreted as the end delimiter of the expression. You should probably use another delimiter for the whole pattern, e.g.:
preg_match("~(\{a\})([a-zA-Z0-9{} .])+(\{/a\})~i",
"{a}{900678}{abcde}{0}{0}{0}{/a}");
See it in action.

Categories