Regex with double negative matches - php

Given a series of strings:
error.user
success
success.user
success.admin
I want to write a regex that will match anything not starting with error, and that also doesn't have .user in it. So for this list, success and success.admin
What I've got so far is: /^((?!error)\w*)((?!\.*user)\w*)/
The first part: ((?!error)\w*) is working fine, and narrowing down the matches to just strings that start with success. For some reason the second part: ((?!\.*user)\w*) is doing precisely nothing. I think the first part is matching too much.
I'm doing this in PHP/PCRE
Here's my regex101.com link: https://regex101.com/r/l2sZru/1

You need to fix your negative regex like this:
^(?!error|.*\.user)[\w.]+$
RegEx Demo
Here (?!error|.*\.user) will assert failure if error is at the start OR if .user` is found anywhere in the input.
(?!\.*user) in your regex means assert failure when input has 0 or more DOTs followed by user at the start only.

Related

How do I extract one group from a URL using regex for use in a redirect?

I've read the Best RegEx Trick Ever and tried to wrap my head around the other answers here on Stack Exchange and just can't seem to get it right. Take these three strings:
http://www.test.com/newyork/class-schedule
http://www.test.com/location/newyork/class-schedule
http://www.test.com/location/newyork/training
I need a regex that will extract the newyork from the first string and save it for a replace later, but will NOT match any part of the other strings. Also, for obscure reasons, I can not include http://www.test.com as a condition for matching (so I can't use anything before the slash that precedes newyork). Note that in this scenario, newyork could easily be chicago, atlanta, or any other city name with no spaces or punctuation.
The only thing I've been able to figure out that isolates only newyork in the first string is the following:
/.*\.com\/(.[^\/]*)\/class-schedule/g
However, this relies on using the URL first which I can't use.
Any ideas on how to achieve this WITHOUT using the URL?
[EDIT]
To clarify what I'm looking for, I'm trying to take the results from the first string and add "location" to it, still using regex. So:
http://www.test.com/newyork/class-schedule
would become
http://www.test.com/location/newyork/class-schedule
using something like
http://www.test.com/location/$1/class-schedule
Try this: ~/(\w+)/[-a-z]+?/?(?:\?.*?)*(:?\s|$)~gm
See it working here: https://regex101.com/r/4VMazZ/3.
So it will use the end of URL instead of the beginning and match only the word between slash 2 and 3 from the end. There can be a query string it will still work.
[EDIT 1]
I exchanged 2 chars doing typo in the end so it was capturing one extra group: /(\w+)/[-a-z]+?/?(?:\?.*?)*(?:\s|$). here: https://regex101.com/r/4VMazZ/4
If you use preg_match($pattern, $string, $matches); the result you want (newyork) will be in $matches[1];, $matches[0] contains everything.
You can see the captures in 'MATCH INFORMATION' panel on regex101 in my example!
[EDIT 2] after your comment.
If you want to replace the whole url you have to match the whole URL, something like this: .*?/(\w+)/[-a-z]+?/?(?:\?.*?)*(?:\s|$) will do in this example. See it working here: https://regex101.com/r/4VMazZ/5
[EDIT 3] Add capturing of last part for replacement.
So as you want to reuse last part you need to add capturing parenthesis: .*?/(\w+)/([-a-z]+?)/?(?:\?.*?)*(?:\s|$).
See it working here: https://regex101.com/r/4VMazZ/6
Could this work? See it here.
(?<=location\/|\.\w{3}\/|\.\w{2}\/)(?!location).*?(?=\/|$)
It matches everything following .xxx/ or .xx/ or location/. I don't know if one letter domain exist, in this case, you can add |\.\w\/ to the lookahead at the start of the regex.
(?<=location\/|\.\w{3}\/|\.\w{2}\/) is a lookahead, so it matches the following pattern only if preceded by location/ or .xxx or .xx
.*? matches every character (lazy)
(?=\/|$) end match if next character is / or on line end
Note: If location is counted as part of the url, I don't think what you are asking is possible in regex, as the city name could be anywhere in string. If so, then you could have a list of cities and check what part of the url matches one of them.
EDIT: You need the multiline m flag so $ also matches end of line

Correct regex for this pattern

I've got some issues understanding this regex.
I tried doing a pattern but does not work like intended.
What I want is [A-Za-z]{2,3}[0-9]{2,30}
That is 2-3 letters in the beginning and 2-30 numbers after that
FA1321321
BFA18098097
I want to use it to validate an input field but can't figure out how the regex should look like.
Can any one that can help me out even explain a bit about it?
Your regex is correct - just make sure to surround it with / in PHP, and perhaps ^, $ if you want it to strictly match the entire string (no extra characters before/after).
$pattern = "/^[A-Za-z]{2,3}[0-9]{2,30}$/"
$found = preg_match($pattern, $your_str);
From the PHP documentation:
preg_match() returns 1 if the pattern matches given subject, 0 if it does not, or FALSE if an error occurred.

Regex in php: Compulsory second occurence of word

I need to match a few urls for an application I'm working on;
So, I've got this reference string:
content/course/32/lesson/61/content/348
and I need a pattern that matches either
content
OR
content/course/[number]/lesson/[number]/content/[number]
What I've done so far is come up with this pattern:
$my_regex = "/content(\/?|(\/course\/\d{1,4}\/lesson\/\d{1,4}\/content\/\d{1,4}))$/";
which however has the following problem: This string returns a match which should otherwise not:
content/course/32/lesson/61/content
I'm thinking that it's got something to do with the word content repeating twice but I'm not entirely sure.
Any help is much appreciated.
The reason for the match is the alternation.
content\/?$
matches
content/course/32/lesson/61/content
To fix this, add a ^ (beginning of line) to the start of your regex to ensure the entire string is matched and not only the ending:
/^content(\/?|(\/course\/\d{1,4}\/lesson\/\d{1,4}\/content\/\d{1,4}))$/
See it in action
this works:
/(^content\/?|content\/course\/\d{1,4}\/lesson\/\d{1,4}\/content\/\d{1,4})$/

Positive look ahead regex confusing

I'm building this regex with a positive look ahead in it. Basically it must select all text in the line up to last period that precedes a ":" and add a "|" to the end to delimit it. Some sample text below. I am testing this in gskinner and editpadpro which has full grep regex support apparently so if I could get the answers in that for I'd appreciate it.
The regex below works to a degree but I am unsure if it is correct. Also it falls down if the text contains brackets.
Finally I would like to add another ignore rule like the one that ignores but includes "Co." in the selection. This second ignore rule would ignore but include periods that have a single Capital letter before them. Sample text below too. Thanks for all the help.
^(?:[^|]+\|){3}(.*?)[^(?:Co)]\.(?=[^:]*?\:)
121| Ryan, T.N. |2001. |I like regex. But does it like me (2) 2: 615-631.
122| O' Toole, H.Y. |2004. |(Note on the regex). Pages 90-91 In: Ryan, A. & Toole, B.L. (Editors) Guide to the regex functionality in php. Timmy, Tommy& Stewie, Quohog. * Produced for Family Guy in Quohog.
I don't think I understand what you want to do. But this part [^(?:Co)] is definitely not correct.
With the square brackets you are creating a character class, because of the ^ it is a negated class. That means at this place you don't want to match one of those characters (?:Co), in other words it will match any other character than "?)(:Co".
Update:
I don't think its possible. How should I distinguish between L. Co. or something similar and the end of the sentence?
But I found another error in your regex. The last part (?=[^:]*?\:) should be (?=[^.]*?\:) if you want to match the last dot before the : with your expression it will match on the first dot.
See it here on Regexr
This seems to do what you want.
(.*\.)(?=[^:]*?:)
It quite simply matches all text up to the last full stop that occurs before the colon.

Trying to find Twitter RT's with Regular Expressions and PHP

I'm trying to find the correct Regular Expression to match all RT scenarios on Twitter (can't wait to Twitter's new retweet API).
The way I see it, RT's can be at the beginning, middle, or end of the string returned from Twitter.
So, I need something at the beginning and end of this Regular Expression:
([Rr])([Tt])
No matter what I try, I cannot match all scenarios in one Regular Expression. I tried
[^|\s+]
to match the scenario where the RT will appear either at the beginning of the string or after one or more whitespace characters, but it didn't work the same for the end of the string or RT. I tried
[\s+|$]
to match a case when the RT appear either in the end of the string or there's one or more whitespace characters following it, same as with the 'pre' -- it didn't work.
Can someone please explain what am I doing wrong here? Any help or suggestions will be highly appreciated (as always :) )
You'll probably be happiest with something like:
/\brt\b/i
Which will find isolated instances of RT (that is, surrounded by word-boundaries), and use the /i modifier at the end of the regex to make it case-insensitive.
You want the word boundaries so that you don't end up thinking random tweets containing words like "Art" and "Quartz" are actually retweets. Even then, it's going to have false positives.
By default, a regular expression can (and will) match anywhere inside a string, so you don't need to account for what may precede or follow your match if indeed you don't care what it is or if it is present.
if(preg_match('/\brt\s*#(\w+)/i', $tweet, $match))
echo 'Somebody retweeted ' . $match[1] . "\n";

Categories