Regex for netbios names - php

I got this issue figuring out how to build a regexp for verifying a netbios name. According to the ms standard these characters are illegal
\/:*?"<>|
So, thats what I'm trying to detect. My regex is looking like this
^[\\\/:\*\?"\<\>\|]$
But, that wont work.
Can anyone point me in the right direction? (not regexlib.com please...)
And if it matters, I'm using php with preg_match.
Thanks

Your regular expression has two problems:
you insist that the match should span the entire string. As Andrzej says, you are only matching strings of length 1.
you are quoting too many characters. In a character class (i.e. []), you only need to quote characters that are special within character classes, i.e. hyphen, square bracket, backslash.
The following call works for me:
preg_match('/[\\/:*?"<>|]/', "foo"); /* gives 0: does not include invalid characters */
preg_match('/[\\/:*?"<>|]/', "f<oo"); /* gives 1: does include invalid characters */

As it stands at the moment, your regex will match the start of the string (^), then exactly one of the characters in the square brackets (i.e. the illegal characters), then then end of the string ($).
So this likely isn't working because a string of length > 1 will trivially fail to match the regex, and thus be considered OK.
You likely don't need the start and end anchors (the ^ and $). If you remove these, then the regex should match one of the bracketed characters occurring anywhere on the input text, which is what you want.
(Depending on the exact regex dialect, you may canonically need less backslashes within the square brackets, but they are unlikely to do any harm in any case).

Related

How to check if string contains specific special characters or starting with a space? [duplicate]

I have the following requirements for validating an input field:
It should only contain alphabets and spaces between the alphabets.
It cannot contain spaces at the beginning or end of the string.
It cannot contain any other special character.
I am using following regex for this:
^(?!\s*$)[-a-zA-Z ]*$
But this is allowing spaces at the beginning. Any help is appreciated.
For me the only logical way to do this is:
^\p{L}+(?: \p{L}+)*$
At the start of the string there must be at least one letter. (I replaced your [a-zA-Z] by the Unicode code property for letters \p{L}). Then there can be a space followed by at least one letter, this part can be repeated.
\p{L}: any kind of letter from any language. See regular-expressions.info
The problem in your expression ^(?!\s*$) is, that lookahead will fail, if there is only whitespace till the end of the string. If you want to disallow leading whitespace, just remove the end of string anchor inside the lookahead ==> ^(?!\s)[-a-zA-Z ]*$. But this still allows the string to end with whitespace. To avoid this look back at the end of the string ^(?!\s)[-a-zA-Z ]*(?<!\s)$. But I think for this task a look around is not needed.
This should work if you use it with String.matches method. I assume you want English alphabet.
"[a-zA-Z]+(\\s+[a-zA-Z]+)*"
Note that \s will allow all kinds of whitespace characters. In Java, it would be equivalent to
[ \t\n\x0B\f\r]
Which includes horizontal tab (09), line feed (10), carriage return (13), form feed (12), backspace (08), space (32).
If you want to specifically allow only space (32):
"[a-zA-Z]+( +[a-zA-Z]+)*"
You can further optimize the regex above by making the capturing group ( +[a-zA-Z]+) non-capturing (with String.matches you are not going to be able to get the words individually anyway). It is also possible to change the quantifiers to make them possessive, since there is no point in backtracking here.
"[a-zA-Z]++(?: ++[a-zA-Z]++)*+"
Try this:
^(((?<!^)\s(?!$)|[-a-zA-Z])*)$
This expression uses negative lookahead and negative lookbehind to disallow spaces at the beginning or at the end of the string, and requiring the match of the entire string.
I think the problem is there's a ? before the negation of white spaces, which means it is optional
This should work:
[a-zA-Z]{1}([a-zA-Z\s]*[a-zA-Z]{1})?
at least one sequence of letters, then optional string with spaces but always ends with letters
I don't know if words in your accepted string can be seperated by more then one space. If they can:
^[a-zA-Z]+(( )+[a-zA-z]+)*$
If can't:
^[a-zA-Z]+( [a-zA-z]+)*$
String must start with letter (or few letters), not space.
String can contain few words, but every word beside first must have space before it.
Hope I helped.

Differences between two regular expressions

Do anybody know why this regex:
/^(([a-zA-Z0-9\(\)áéíóúÁÉÍÓÚñÑ,\.°-]+ *)+)$/
works but this one doesn't:
/^(([a-zA-Z0-9áéíóúÁÉÍÓÚñÑ,\.°-\(\)]+ *)+)$/
The difference is the place where the parenthesis are... I tryed with some online PHP regex testers and got the same result. The second one simply doesn't work...
PHP returns:
preg_match(): Compilation failed: range out of order in character class at offset 44 in...
This is not a critic question because I've managed to make it work but I have the curiosity!
Maybe the unicode characters are changing something?
When the - character is used inside of brackets (indicating a character set) it indicates a range unless it is the last character in the set, first character in the set, or directly after the opening negating character. Then it means a literal dash. By moving it from the end to the middle you changed its meaning. If you want to keep it in the middle you will need to escape it: \-.
If the hyphen is placed as the first or last character in the character class, it is treated as a literal - (as opposed to a range), and as a result do not require escaping.
These are the positions where the hyphen do not need to be escaped:
right after the opening bracket ([), or
right before the closing bracket (]), or
right after the negating caret (^)
In the second regular expression, you're placing the hyphen in the middle, and the regular expression engine tries to create a range with the character before the hyphen, the character after the hyphen, and all characters that lie between them in numerical order. As such a range isn't possible, an error message is triggered. See asciitable.com for the character table.
Putting the hyphen last in the expression actually causes it to not require escaping, as it then can't be part of a range, however you might still want to get into the habit of always escaping it.
At your first regex you've managed every thing correctly even that - hyphen which is at the end of it. well it should be there too! I mean it has two places if you don't want to escape it, one place is at the end of char class and the other one at the beginning of char class!
You guessed nice! otherwise you should escape it!

JavaScript regex not working for PHP

I have a javascript regex
Value.match(/[A-Za-z0-9\-\,\.\(\)/]/)
This gives me 1 if a string contains alphabets, numbers, hyphen, comma, dot or braces; if any other character is found it gives 0.
When I apply same regex in PHP it is not working. Why?
You don't need to escape characters inside [] so you can try this /[A-Za-z0-9,.()]/ or even this one /[\w,.()]/ but if you want to check that the string contains only those characters that regex won't do, try:
/^[\w,.()]+$/
I noticed that you also have /. Is that intentional or a mistake, because you don't mention it in the question...

PHP Regex for checking space or certain characters after string

I need a regex which can basically check for space, line break etc after string.
So conditions are,
Allow special characters ., _, -, + inside the string i.e.#hello.world, #hello_world, #helloworld, etc.
Discard anything including special characters where there is no alpha-numeric string after them i.e. #helloworld.<space>, #helloworld-<space>, #helloworld.?, etc. must be parsed as #helloworld
My existing RegEx is /#([A-Za-z0-9+_.-]+)/ which works perfectly Condition #1, but still there seems to be a problem Condition #2
I am using above RegEx in preg_replace()
Solution:
$str = preg_replace('##[\w+.\-]+\b#', '[[$0]]', $str);
This works perfectly.
Tested with
http://gskinner.com/RegExr/
You can use word boundaries to easily find the position between an alphanumeric letter and a non-alphanumeric letter:
$str = preg_replace('##[\w+.\-]+\b#', '[[$0]]', $str);
Working example: http://ideone.com/0ShCm
Here's an idea:
Use strrev to reverse the string
Use strcspn to find the longest prefix of the reversed string that does not contain any alphanumeric characters
Cut the prefix off with substr
Reverse the string again; this is your final result
See it in action.
I 'm not taking into account any requirement that restricts the legal characters in the string to some subset, but you can use your regular expression for that (or even strspn, which might be faster).
The reason is because it's reading the string as a whole. If you want it to parse out everything after the alphanumeric section you might have to do like and end(explode()); and run that through to make sure that it isn't valid and if it isn't valid then remove it from the equation, but then you'd have to check the end for every possible explode point i.e. .,-,~,etc.
Then again another trap that you might run into is that in the case of a item or anything w/ alphanumeric value it might just parse everything from after the last alphanumeric character on.
Sorry that this isn't much help, but I figured thinking aloud does help.

Why does this regex not validate in the same way in PHP?

when I try preg_match with the following expression: /.{0,5}/, it still matches string longer than 5 characters.
It does, however, work properly when trying in online regexp matcher
The site you reference, myregexp.com, is focussed on Java.
Java has a specific function for matching an exact pattern, without needing to use anchor characters. This is the function which myregexp.com uses.
In most other languages, in order to match an exact pattern, you would need to add the anchoring characters ^ and $ at the start and end of the pattern respectively, otherwise the regex assumes it only needs to find the matched pattern somewhere within the string, rather than the whole string being the match.
This means that without the anchors, your pattern will match any string, of any length, because whatever the string, it will contain within it somewhere a match for "zero to five of any character".
So in PHP, and Perl, and virtually any other language, you need your pattern to look like this:
/^.{0,5}$/
Having explained all that, I would make one final observation though: this specific pattern really doesn't need to be a regular expression -- you could achieve the same thing with strlen(). In addition, the dot character in regex may not work exactly as you expect: it typically matches almost any character; some characters, including new line characters, are excluded by default, so if your string contains five characters, but one of them is a new line, it will fail your regex when you might have expected it to pass. With this in mind, strlen() would be a safer option (or mb_strlen() if you expect to have unicode characters).
If you need to match any character in regex, and the default behaviour of the dot isn't good enough, there are two options: One is to add the s modifier at the end of the expression (ie it becomes /^.{0,5}$/s). The s modifier tells regex to include new line characters in the dot "any character" match.
The other option (which is useful for languages that don't support the s modifier) is to use an expression and its negative together in a character class - eg [\s\S] - instead of the dot. \s matches any white space character, and \S is a negative of \s, so any character not matched by \s. So together in a character class they match any character. It's more long winded and less readable than a dot, but in some languages it's the only way to be sure.
You can find out more about this here: http://www.regular-expressions.info/dot.html
Hope that helps.
You need to anchor it with ^$. These symbols match the beginning and end of the string respectively, so it must be 0-5 characters between the beginning and end. Leaving out the anchors will match anywhere in the string so it could be longer.
/^.{0,5}$/
For better readability, I would probably also enclose the . in (), but that's kind of subjective.
/^(.){0,5}$/

Categories