Issue with regular expression for string validation - php

I am trying to validate following type of string using regular expressions in PHP. Using PHP 5.5.9.
String is in following format:
/[sometext]/course/[sometext1]/[sometext2]
What I need is a regex that will accept string that is only in that format and nothing else. Meaning these would be invalid:
/aaa/course/bbb/ccc/
/aaa/course/bbb/ccc/ddd
What I have so far is this:
/\/(?P<domain>.+?)\/course\/(?P<courseid>.+?)\/(?P<reportname>.+?)/
Any ideas?
Update:
With the help from all posters and especially wiktor-stribi%c5%bcew I got this one that works:
$regex = '#^/(?P<domain>[^/]+)/course/(?P<courseid>[^/]+)/(?P<reportname>[^/]+)$#';

You can use the following regular expression:
^\/(?P<domain>[^\/]+)\/course\/(?P<courseid>[^\/]+)\/(?P<reportname>[^\/]+)$
PHP:
$re = '~^/(?P<domain>[^/]+)/course/(?P<courseid>[^/]+)/(?P<reportname>[^/]+)$~';
See the regex demo
The [^\/] is a negated character class that matches any character but /.
The ^ and $ are usually enough to make sure your input starts and ends with the current pattern (you can replace them with \A and \z respectively to make sure the \z matches at the very end of the string, or use ^/$ with the /D modifier).
Even if you use lazy .+? dot matching, the . can overflow several / delimiters if it is necessary to return a valid match.

first... use something other than '/' as your delimiter (the slashes as the beginning and end of the regex)... it makes it easier to write the regex without having to escape the delimiter within
$regex = '#^/[a-z]+/[^/]+/[a-z]+/[a-z]+$#'

Related

Regex - Match Word Aslong As Nothing Follows It

Having a little trouble with regex. I'm trying to test for a match but only if nothing follows it. So in the below example if I go to test/create/1/2 - it still matches. I only want to match if it's explicitally test/create/1 (but the one is dynamic).
if(preg_match('^test/create/(.*)^', 'test/create/1')):
// do something...
endif;
I've found some answers that suggest using $ before my delimiter but it doesn't appear to do anything. Or a combination of ^ and $ but I can't quite figure it out. Regex confuses the hell out of me!
EDIT:
I didn't really explain this well enough so just to clarify:
I need the if statement to return true if a URL is test/create/{id} - the {id} being dynamic (and of any length). If the {id} is followed by a forward slash the if statement should fail. So that if someone types in test/create/1/2 - it will fail because of the forward slash after the 1.
Solution
I went for thedarkwinter's answer in the end as it's what worked best for me, although other answers did work as well.
I also had to add an little extra in the regex to make sure that it would work with hyphens as well so the final code looked like this:
if(preg_match('^test/create/[\w-]*$^', 'test/create/1')):
// do something...
endif;
/w matches word characters, and $ matches end of string
if(preg_match('^test/create/\w*$^', 'test/create/1'))
will match test/create/[word/num] and nothing following.
I think thats what you are after.
edit added * in \w*
Here you go:
"/^test\\/create\\/([^\\/]*)$/"
This says:
The string that starts with "test" followed by a forward slash (remember the first backslash escapes the second so PHP puts a letter backslash in the input, which escapes the / to regex) followed by create followed by a forward slash followed by and capture everything that isn't a slash which is then the end of the string.
Comment if you need more detail
I prefer my expressions to always start with / because it has no meaning as a regex character, I've seen # used, I believe some other answer uses ^, this means "start of string" so I wouldn't use it as my regex delimiters.
Use following regular expression (use $ to denote end of the input):
'|test/create/[^/]+$|'
If you want only match digits, use folloiwng instead (\d match digit character):
'^test/create/\d+$^'
The ^ is an anchor for the beginning of the line, i.e. no characters occurring before the ^ . Use a $ to designate the end of the string, or end of the line.
EDIT: wanted to add a suggestion as well:
Your solution is fine and works, but in terms of style I'd advise against using the carat (^) as a delimiter -- especially because it has special meaning as either negation or as a start of line anchor so it's a bit confusing to read it that way. You can legally use most special characters as long as they don't occur (or are escaped) in the regex itself. Just talking about a matter of style/maintainability here.
Of course nearly every potential delimiter has some special meaning, but you also often tend to see the ^ at the beginning of a regex so I might chose another alternative. For example # is a good choice here :
if(preg_match('#test/create/[\w-]*$#', $mystring)) {
//etc
}
The regex abc$ will match abc only when it's the last string.
abcd # no match
dabc # match
abc # match

Regex to match a specific expression format

I'm trying to find a regex that will match a specific expression in the following format:
name = value
However, I need it to not match:
name.extra = value
I have the following regex:
([\w\#\-]+) *(\=|\>|\>\=|\<|\<\=) *([^\s\']+)
which matches the first expression, but also matches the second expression (extra = value).
I need a regex that will match only the first expression and not the second (i.e. with a dot).
Just add ^ beginning and $ ending to your expression
^([\w\#\-]+) *(\=|\>|\>\=|\<|\<\=) *([^\s\']+)$
Negative lookbehind assertion (?<!) might be what you are looking for.
For a simple assignment: (?<!\.)\b(\w+)\s*=\s*(\w+)
summary:
(?<!\.) = prevent the character . at that location
\b = beginning of a word
The captured words are:
\1 = destination name
\2 = source name
and using the regex you specified, this should give something near this:
(?<!\.)\b([\w\#\-]+) *(\=|\>|\>\=|\<|\<\=) *([^\s\']+)
You don't say what language you're using, but it sounds like you don't need to use regexes at all.
If you're using PHP, then use the explode function to break apart on the =. Then check to see if the argument name has a period in it.

What is wrong in this regular expression and how can I improve it?

I'm using the following regex code:
^[a-z0-9_-]{3,15}$^
I'm using this for username validation and I want it to match alphanumeric characters, - , _ and periods.
The following weird thing happens:
It doesn't match this:
bla.b
But it matches this one:
bla.blabla
How can I change this, so that it matches both? I still would like to be able to change the min and max characters freely. (btw. there maybe more wrong things about this regex. This one I discovered accidentally)
UPDATE: I should mention that I'm using this in CakePHP validation and this gives me an error:
^[a-z0-9_.-]{3,15}$
this is the error:
Warning (2): preg_match() [function.preg-match]: No ending delimiter '^' found
You made a little mistake and forgot to put the ^ at the beginning. Choosing a different delimiter might make that more visible:
^[a-z0-9_-]{3,15}$^ // your non-working version
^ ^
/^[a-z0-9_-]{3,15}$/ // using / as delimiters instead, setting the beginning
^
Remember:
^ - marks the beginning of the subject
$ - marks the end of the subject
Both are part of the pattern. The delimiters are used to separate the pattern from the modifiers (you don't use any modifiers here).
Alternatively you can denote the beginning and end as well with \A and \Z if it helps.
To now also match the dot, add it to your character class:
/^[a-z0-9_.-]{3,15}$/
^
^[a-z0-9_-]{3,15}$^
should be:
^[a-z0-9_-]{3,15}$
^ denotes the start of the string, and $ denotes the end of string.
This should do it:
/^[a-z0-9_\.\-]{3,15}$/
If you want to match a username then you probably do not want it to start or end with a dot. In that case you can use this:
/^(?<!\.)[a-z0-9_\.\-]{3,15}(?!\.)$/
This is how that regex breaks down:
^ means the "beginning of the string"
(?<!\.) makes sure that the username cannot start with a dot
[a-z0-9\._-]{3,15} means 3 to 15 alphanumeric characters, dots, underscores and hyphens
(?!\.) makes sure that the username cannot end with a dot
$ means the "end of the string"
If you allow uppercase characters then you can shorten the regex slightly:
/^(?<!\.)[\w\.\-]{3,15}(?!\.)$/
The \w is short for [a-zA-Z0-9_], also called word characters.
Another way of making sure that a username does not start or end with a dot is to use three consecutive [], like so:
/^[\w\-][\w\.\-]{1,13}[\w\-]$/
It can be useful if you need to match something in Javascript which, I believe, does not support lookbehind and lookahead.

meaning of `$/i` in regular expressions

What does the $/i mean in the following php code?
preg_match ('/^[A-Z \'.-]{2,20}$/i')
/ denotes the end of the pattern. The i is a modifier that makes the pattern case-insensitive, and the $ anchor matches the end of the string.
the $ is an anchor -- it means the end of the string should be there. the / is the end delimiter for the regular expression. The i means that the regular expressions should be case-insensitive (notice that [A-Z \'.-] only matches A-Z -- the i means it doesn't have to look for a-z as well).
Dollar sign is a common regex symbol meaning "end of line".
The slash at the end is the end of the expression itself.
Any letters after that slash are options you can turn on or off, called modifiers. In the case of i it means case-insensitive.
$ Matches at the end of the string the regex pattern is applied to. Matches a position rather than a character
/ is the ending delimiter of the regex pattern in PHP
i represents case insensitive regular expression search
you can also use this to understand things better, and can be used for testing/practice too.
http://gskinner.com/RegExr/

php regular expression help finding multiple filenames only not full URL

I am trying to fix a regular expression i have been using in php it finds all find filenames within a sentence / paragraph. The file names always look like this: /this-a-valid-page.php
From help i have received on SOF my old pattern was modified to this which avoids full urls which is the issue i was having, but this pattern only finds one occurance at the beginning of a string, nothing inside the string.
/^\/(.*?).php/
I have a live example here: http://vzio.com/upload/reg_pattern.php
Remove the ^ - the carat signifies the beginning of a string/line, which is why it's not matching elsewhere.
If you need to avoid full URLs, you might want to change the ^ to something like (?:^|\s) which will match either the beginning of the string or a whitespace character - just remember to strip whitespace from the beginning of your match later on.
The last dot in your expression could still cause problems, since it'll match "one anything". You could match, for example, /somefilename#php with that pattern. Backslash it to make it a literal period:
/\/(.*?)\.php/
Also note the ? to make .* non-greedy is necessary, and Arda Xi's pattern won't work. .* would race to the end of the string and then backup one character at a time until it can match the .php, which certainly isn't what you'd want.
To find all the occurrences, you'll have to remove the start anchor and use the preg_match_all function instead of preg_match :
if(preg_match_all('/\/(.*?)\.php/',$input,$matches)) {
var_dump($matches[1]); // will print all filenames (after / and before .php)
}
Also . is a meta char. You'll have to escape it as \. to match a literal period.

Categories