Match string not followed by another string

Match string not followed by another string - php

I'm trying to match a string. For example, in the following strings, first should match, the second should not.
/users/akinuri/
/users/akinuri/asd/
I've tried a negative lookahead, but it didn't work. Obviously I'm doing something wrong.
if (preg_match("/\/users\/.*?\/(?!.*\/)/", "/users/akinuri/asd/")) {
echo "match";
}
I'm experimenting (trying to create a route system). Right now, I'm just trying to determine if the requested uri is valid. If a visitor requests the second string, it should return false and I'll return a not found page. For example, test this on SO. The second url returns a not found page.
https://stackoverflow.com/users/2202732/akinuri/
https://stackoverflow.com/users/2202732/akinuri/asdasd
So, can I do this check using regex? If so, how? What am I doing wrong?
Or should I split the text and then do further checks? This seems a bit redundant.

You don't need a lookahead for this. You can use a negated character class [^/]+ to match 1+ of any character except /. You also need to use anchors in regex to make sure you match complete input.
For PHP code, you can use this regex:
'~^/users/[^/]+/?$~'
Note that /?$ makes trailing slash an optional match.
RegEx Demo

Related

(PHP) How to find words beginning with a pattern and replace all of them?

I have a string. An example might be "Contact /u/someone on reddit, or visit /r/subreddit or /r/subreddit2"
I want to replace any instance of "/r/x" and "/u/x" with "[/r/x](http://reddit.com/r/x)" and "[/u/x](http://reddit.com/u/x)" basically.
So I'm not sure how to 1) find "/r/" and then expand that to the rest of the word (until there's a space), then 2) take that full "/r/x" and replace with my pattern, and most importantly 3) do this for all "/r/" and "/u/" matches in a single go...
The only way I know to do this would be to write a function to walk the string, character by character, until I found "/", then look for "r" and "/" to follow; then keep going until I found a space. That would give me the beginning and ending characters, so I could do a string replacement; then calculate the new end point, and continue walking the string.
This feels... dumb. I have a feeling there's a relatively simple way to do this, and I just don't know how to google to get all the relevant parts.

A simple preg_replace will do what you want.
Try:
$string = preg_replace('#(/(?:u|r)/[a-zA-Z0-9_-]+)#', '[\1](http://reddit.com\1)', $string);
Here is an example: http://ideone.com/dvz2zB
You should see if you can discover what characters are valid in a Reddit name or in a Reddit username and modify the [a-zA-Z0-9_-] charset accordingly.

You are looking for a regular expression.
A basic pattern starts out as a fixed string. /u/ or /r/ which would match those exactly. This can be simplified to match one or another with /(?:u|r)/ which would match the same as those two patterns. Next you would want to match everything from that point up to a space. You would use a negative character group [^ ] which will match any character that is not a space, and apply a modifier, *, to match as many characters as possible that match that group. /(?:u|r)/[^ ]*
You can take that pattern further and add a lookbehind, (?<= ) to ensure your match is preceded by a space so you're not matching a partial which results in (?<= )/(?:u|r)/[^ ]*. You wrap all of that to make a capturing group ((?<= )/(?:u|r)/[^ ]*). This will capture the contents within the parenthesis to allow for a replacement pattern. You can express your chosen replacement using the \1 reference to the first captured group as [\1](http://reddit.com\1).
In php you would pass the matching pattern, replacement pattern, and subject string to the preg_replace function.

In my opinion regex would be an overkill for such a simple operation. If you just want to replace instance of "/r/x" with "[r/x](http://reddit.com/r/x)" and "/u/x" with "[/u/x](http://reddit.com/u/x)" you should use str_replace although with preg_replace it'll lessen the code.
str_replace("/r/x","[/r/x](http://reddit.com/r/x)","whatever_string");
use regex for intricate search string and replace. you can also use http://www.jslab.dk/tools.regex.php regular expression generator if you have something complex to capture in the string.

Adding any pattern to match in regex

I have a regex that works pretty well except in one particular situation;
$message = preg_replace("#(^(http(s)?://)(?!img.youtube.com/vi/)([-a-zA-Z?-??-?()0-9#:%_+.~\#?&;//=,]+(\.jpg|\.jpeg|\.gif|\.bmp|\.png)))#i",
"<p><a href='/viewpost.php?messageid=$message_id'><img src='$1' width=100%></a>", $message);
This pattern does several things, 1) It exactly matches http or https, 2) it ignores any string that includes img.youtube.com/vi/ , and 3) it looks for popular image file types in the links. It works the way it should only if their are no character before a sting (a sting like http://exampleaddress/exampleimage.jpeg). If the string is in the middle of a paragraph, it fails.
I need to keep the ^(http(s)?://) as an exact match (removing ^ fixes my problem but causes a conflict with a subsequent regex rule. So, it looks like the problem is that this exact match situation does not want any carriage returns, spaces, or anything to precede ^(http(s)?://) . How can I make the regex work so that nothing before of after the string is relevant, but when you see exactly http or https to apply the rule?

As you know, the ^ anchor requires the string to appear exactly at the beginning of the input string. You can achieve a similar restriction anywhere inside the input string with a \b word boundary. It matches a zero-length string at the start of a word - for example after but not including whitespace.
I'll note also that you do not need to surround the s in a () group, since the ? will match only the single preceding character.
\bhttps?://...

Regex - Match Word Aslong As Nothing Follows It

Having a little trouble with regex. I'm trying to test for a match but only if nothing follows it. So in the below example if I go to test/create/1/2 - it still matches. I only want to match if it's explicitally test/create/1 (but the one is dynamic).
if(preg_match('^test/create/(.*)^', 'test/create/1')):
// do something...
endif;
I've found some answers that suggest using $ before my delimiter but it doesn't appear to do anything. Or a combination of ^ and $ but I can't quite figure it out. Regex confuses the hell out of me!
EDIT:
I didn't really explain this well enough so just to clarify:
I need the if statement to return true if a URL is test/create/{id} - the {id} being dynamic (and of any length). If the {id} is followed by a forward slash the if statement should fail. So that if someone types in test/create/1/2 - it will fail because of the forward slash after the 1.
Solution
I went for thedarkwinter's answer in the end as it's what worked best for me, although other answers did work as well.
I also had to add an little extra in the regex to make sure that it would work with hyphens as well so the final code looked like this:
if(preg_match('^test/create/[\w-]*$^', 'test/create/1')):
// do something...
endif;

/w matches word characters, and $ matches end of string
if(preg_match('^test/create/\w*$^', 'test/create/1'))
will match test/create/[word/num] and nothing following.
I think thats what you are after.
edit added * in \w*

Here you go:
"/^test\\/create\\/([^\\/]*)$/"
This says:
The string that starts with "test" followed by a forward slash (remember the first backslash escapes the second so PHP puts a letter backslash in the input, which escapes the / to regex) followed by create followed by a forward slash followed by and capture everything that isn't a slash which is then the end of the string.
Comment if you need more detail
I prefer my expressions to always start with / because it has no meaning as a regex character, I've seen # used, I believe some other answer uses ^, this means "start of string" so I wouldn't use it as my regex delimiters.

Use following regular expression (use $ to denote end of the input):
'|test/create/[^/]+$|'
If you want only match digits, use folloiwng instead (\d match digit character):
'^test/create/\d+$^'

The ^ is an anchor for the beginning of the line, i.e. no characters occurring before the ^ . Use a $ to designate the end of the string, or end of the line.
EDIT: wanted to add a suggestion as well:
Your solution is fine and works, but in terms of style I'd advise against using the carat (^) as a delimiter -- especially because it has special meaning as either negation or as a start of line anchor so it's a bit confusing to read it that way. You can legally use most special characters as long as they don't occur (or are escaped) in the regex itself. Just talking about a matter of style/maintainability here.
Of course nearly every potential delimiter has some special meaning, but you also often tend to see the ^ at the beginning of a regex so I might chose another alternative. For example # is a good choice here :
if(preg_match('#test/create/[\w-]*$#', $mystring)) {
//etc
}

The regex abc$ will match abc only when it's the last string.
abcd # no match
dabc # match
abc # match

How can I make this URL validation regular expression less greedy?

So I have the following regular expression:
https?://(www\.)?flickr\.com/photos/(.+)/?
To match against the following URL:
http://www.flickr.com/photos/username/
How can I stop the final forward slash (/) from being included in the username sub-pattern (.+)?
I have tried:
https?://(www\.)?flickr\.com/photos/(.+?)/?
But then it only matches the first letter of the username.

https?://(?:www\.)?flickr\.com/photos/([^/]+)/?
I added ?: to the first group so it's not capturing, then used [^/] instead of the dot in the last match. This assures you that everything between "photos/" and the very next "/" is captured.
If you need to capture the first www just use this:
https?://(www\.)?flickr\.com/photos/([^/]+)/?

You need to make sure it doesn't match the forward slash:
https?://(?:www\.)?flickr\.com/photos/([^/]+)/?
You could also make the regex lazy (which is what I guess you were doing with the (.+?) syntax), but the above will work just fine

Change (.+) to ([^/]+). This will match until it encounters a /, so you might want to throw some other stuff in the class too.

There are generally two ways to do this:
Append a question mark, to make the matching non-greedy. .* will match as much as possible, .*? will match as little as possible.
Exclude the character you want to match next. If you want to stop on /, use [^/]*.

If you know there will be a trailing slash, take out the final ?.

Regex/PHP check if group of characters appears only once

I am trying to validate an input in PHP with REGEX. I want to check whether the input has the %s character group inside it and that it appears only once. Otherwise, the rule should fail.
Here's what I've tried:
preg_match('|^[0-9a-zA-Z_-\s:;,\.\?!\(\)\p{L}(%s){1}]*$|u', $value); (there are also some other rules besides this; I've tried the (%s){1} part and it doesn't work).
I believe it is a very easy solution to this, but I'm not really into REGEX's...Thank you for your help!

If I understand your question, you need a positive lookahead. The lookahead causes the expression to only match if it finds a single %s.
preg_match('|^(?=[^%s].*?[%s][^%s]*$)[0-9a-zA-Z_-\s:;,\.\?!\(\)\p{L}(%s){1}]*$|u', $value);
I'll explain how each part works
^(?=[^%s].*?[%s][^%s]*$) is a zero-width assertion -- (?=regex) a positive lookahead -- (meaning it must match, but does not "eat" any characters). It means that the whole line can have only 1 %s.
[0-9a-zA-Z_-\s:;,\.\?!\(\)\p{L}(%s){1}]*$ The remaining part of the regex also looks at the entire string and ensures that the whole string is composed only of the characters in the character class (like your original regex).

I managed to do this with PHP's substr_count() function, following Johnsyweb suggestion to use an alternate way to perform the validation and because the REGEX's suggested seem pretty complicated.
Thank you again!

Alternatively, you can use preg_match_all with your pattern and check the number of matches. If it's 1, then you're ok - something like this:
$result = (preg_match_all('|^[0-9a-zA-Z_-\s:;,\.\?!\(\)\p{L}(%s){1}]*$|u', $value) == 1)

Try this:
'|^(?=(?:(?!%s).)*%s(?:(?!%s).)*$)[0-9_\s:;,.?!()\p{L}-]+$|u'
The (%s){1} sequence inside the square brackets probably doesn't do what you think it does, but never mind, the solution is more complex. In fact, {1} should never appear anywhere in a regex. It doesn't ensure that there's only one of something, as many people assume. As a matter of fact, it doesn't do anything; it's pure clutter.
EDIT (in answer to the comment): To ensure that only one of a particular sequence is present in a string, you have to actively examine every single character, classifying it as either part-of-%s or not part-of-%s. To that end, (?:(?!%s).)* consumes one character at a time, after the negative lookahead has confirmed that the character is not the start of %s.
When that part of the lookahead expression quits matching, the next thing in the string has to be %s. Then the second (?:(?!%s).)*$ kicks in to confirm that there are no more %s sequences until the end of the string.
And don't forget that the lookahead expression must be anchored at both ends. Because the lookahead is the first thing after the main regex's start anchor you don't need to add another ^. But the lookahead must end with its own $ anchor.

If you're not "into" regular expressions, why not solve this with PHP?
One call to the builtin strpos() will tell you if the string has a match. A second call will tell you if it appears more than once.
This will be easier for you to read and for others to maintain.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Match string not followed by another string - php

Related

(PHP) How to find words beginning with a pattern and replace all of them?

Adding any pattern to match in regex

Regex - Match Word Aslong As Nothing Follows It

How can I make this URL validation regular expression less greedy?

Regex/PHP check if group of characters appears only once

Categories

Resources