Regular Expressions: How to Express \w Without Underscore - php

Is there a concise way to express:
\w but without _
That is, "all characters included in \w, except _"
I'm asking this because I'm looking for the most concise way to express domain name validation. A domain name may include lowercase and uppercase letters, numbers, period signs and dashes, but no underscores. \w includes all of the above, plus an underscore. So, is there any way to "remove" an underscore from \w via regex syntax?
Edited: I'm asking about regex as used in PHP.
Thanks in advance!

the following character class (in Perl)
[^\W_]
\W is the same as [^\w]

You could use a negative lookahead: (?!_)\w
However, I think writing [a-zA-Z0-9.-] is more readable.

To be on the safe side, usually, we will use character class:
[a-zA-Z0-9.-]
The regex "fragment" above match English alphabet, and digits, plus period . and dash -. It should work even with the most basic regex support.
Shorter may be better, but only if you know exactly what it represents.
I don't know what language you are using. In a lot of engines, \w is equivalent to [a-zA-Z0-9_] (some requires "ASCII mode" for this). However, some engine have Unicode support for regex, and may extend \w to match Unicode characters.

If my understanding is right \w means [A-Za-z0-9_] period signs, dashes are not included.
info:
http://en.wikipedia.org/wiki/Regular_expression#POSIX_character_classes
so I guess what you want is [a-zA-Z0-9.-]

Some regex flavours have a negative lookbehind syntax you might use:
\w(?<!_)

I would start with [^_], and then think of what else characters I need to deny. If you need to filter a keyboard input, it's quite simple to enumerate all the unwanted characters.

You can write something like this:
\([^\w]|_)\u
If you use preg_filter with this string any character in \w (excluding _ underscore) will be filtered.

Related

Match whole words in utf

I want to replace all occurrences of a with 5. Here is the code that works well:
$content=preg_replace("/\ba\b/","5", $content);
unless I have words like zapłać where a is between non standard characters, or zmarła where there is a Unicode (or non-ASCII) letter followed by a at the end of word. Is there any easy way to fix it?
the problem is that the predefined character class \w is ASCII based and that does not change, when the u modifier is used. (See regular-expressions.info, preg is PCRE in the columns)
You can use lookbehind and lookahead to do it:
$content=preg_replace("/(?<!\p{L})a(?!\p{L})/","5",$content);
This will replace "a" if there is not a letter before and not a letter ahead.
\p{L}: any kind of letter from any language.
$content=preg_replace("/\ba\b/u","5",$content);

preg_replace in PHP - regular expression for NOT condition

I am trying to write a function in PHP using preg_replace where it will replace all those characters which are NOT found in list. Normally we replace where they are found but this one is different.
For example if I have the string:
$mystring = "ab2c4d";
I can write the following function which will replace all numbers with *:
preg_replace("/(\d+)/","*",$mystring);
But I want to replace those characters which are neither number nor alphabets from a to z. They could be anything like #$*();~!{}[]|\/.,<>?' e.t.c.
So anything other than numbers and alphabets should be replaced by something else. How do I do that?
Thanks
You can use a negated character class (using ^ at the beginning of the class):
/[^\da-z]+/i
Update: I mean, you have to use a negated character class and you can use the one I provided but there are others as well ;)
Try
preg_replace("/([^a-zA-Z0-9]+)/","*",$mystring);
You want to use a negated "character class". The syntax for them is [^...]. In your case just [^\w] I think.
\W matches a non-alpha, non-digit character. The underscore _ is included in the list of alphanumerics, so it also won't match here.
preg_replace("/\W/", "something else", $mystring);
should do if you can live with the underscore not being replaced. If you can't, use
preg_replace("/[\W_]/", "something else", $mystring);
The \d, \w and similar in regex all have negative versions, which are simply the upper-case version of the same letter.
So \w matches any word character (ie basically alpha-numerics), and therefore \W matches anything except a word character, so anything other than an alpha-numeric.
This sounds like what you're after.
For more info, I recommend regular-expressions.info.
Since PHP 5.1.0 can use \p{L} (Unicode letters) and \p{N} (Unicode digits) that is unicode equivalent like \d and \w for latin
preg_replace("/[^\p{L}\p{N}]/iu", $replacement_string, $original_string);
/iu modifiers at the end of pattern:
i (PCRE_CASELESS)
u (PCRE_UTF8)
see more at: https://www.php.net/manual/en/reference.pcre.pattern.modifiers.php

Check a variable using regex

Im about to create a registration form for my website. I need to check the variable, and accept it only if contains letter, number, _ or -.
How can do it with regex? I used to work with them with preg_replace(), but i think this is not the case. Also, i know that the "ereg" function is dead. Any solutions?
this regex is pretty common these days.
if(preg_match('/^[a-z0-9\-\_]+$/i',$username))
{
// Ok
}
Use preg_match:
preg_match('/^[\w-]+$/D', $str)
Here \w describes letters, digits and the _, so [\w-]+ matches one or more letters, digits, _, and -. ^ and $ are so called anchors that denote the begin and end of the string respectively. The D modifier avoids that $ really matches the end of the string and is not followed by a line break.
Note that the letter and digits that are matched by \w depend on the current locale and might match other letter or digits than just [a-zA-Z0-9]. So if you just want these, use them explicitly. And if you want to allow more than these, you could also try character classes that are describes by Unicode character properties like \p{L} for all Unicode letters.
Try preg_match(). http://php.net/manual/en/function.preg-match.php

Allow only letters; no punctuation no numbers

Hey guys can you help me with this. I've got this '/[^A-Za-z]/' but cannot figure out the punctuations part.
Gracious!
The regular expression you are using doesn't allow letters; it's the opposite of what you are reported in the title.
/[a-z]/i is enough, if you want to accept only letters. If you want to allow letters like à, è, or ç, then you should expand the regular expression; /[\p{L}]/ui should work with all the Unicode letters.
#^[^a-z]+$#i
Your code was correct, you just need ^ and $. So it means all character from the beginning to the end doesn't allow outside alphabet. Negative match is preferred than positive match here.
/[^A-Za-z]*/ will match everything except letters. You shouldn't need to specify numbers or punctuation.
Inside of a character class, the ^ means not.
So you're looking for not a letter.
You want something like
[A-Za-z]+
you can also use the shorthand \w for a "word character" (alphanumeric plus _). Of course some regex engines may differ on support for this, but if it's PCRE it should work. See here (under heading "escape sequences").

PHP: question about preg_replace()

i have found this:
$text = preg_replace('/\W+/', '-', $text);
Anyone can tell me what exactly do that? There is no information about what '/\W+/' means..
Regards
Javi
\W means a non-alphanumeric character, so anything other than a-z, A-Z, 0-9, or underscore.
This is standard for regular expressions, nothing specific to Php.
Here's a great tool for testing regular expressions:
http://www.gskinner.com/RegExr/
If you put \W+ in the box at the top you'll see what kinds of things it matches.
PS: Here's another tool that's simpler and cleaner, though perhaps not as feature rich:
http://rubular.com/
It includes a handy quick-reference for regular expressions at the bottom.
Looks like it replaces anything that isn't a 'word character' (letter, digit, underscore) and makes them hyphens.
The preg family of functions uses Perl Compatible Regular Expressions, or PCRE. There's a nice cheat sheet for them here (PDF).
The \W means "any non word character", and the + would limit it to matches of one or more of the preceding character. "Word characters" are defined to be letters, digits and underscores, so \W would match characters that aren't one of those.
Your line of code would replace any occurrence of a set of characters that aren't word characters with a hyphen.
It's documented at http://es2.php.net/manual/en/regexp.reference.backslash.php (linked from the PCRE section of the PHP manual where preg_replace is explained).

Categories