I am trying to write a function in PHP using preg_replace where it will replace all those characters which are NOT found in list. Normally we replace where they are found but this one is different.
For example if I have the string:
$mystring = "ab2c4d";
I can write the following function which will replace all numbers with *:
preg_replace("/(\d+)/","*",$mystring);
But I want to replace those characters which are neither number nor alphabets from a to z. They could be anything like #$*();~!{}[]|\/.,<>?' e.t.c.
So anything other than numbers and alphabets should be replaced by something else. How do I do that?
Thanks
You can use a negated character class (using ^ at the beginning of the class):
/[^\da-z]+/i
Update: I mean, you have to use a negated character class and you can use the one I provided but there are others as well ;)
Try
preg_replace("/([^a-zA-Z0-9]+)/","*",$mystring);
You want to use a negated "character class". The syntax for them is [^...]. In your case just [^\w] I think.
\W matches a non-alpha, non-digit character. The underscore _ is included in the list of alphanumerics, so it also won't match here.
preg_replace("/\W/", "something else", $mystring);
should do if you can live with the underscore not being replaced. If you can't, use
preg_replace("/[\W_]/", "something else", $mystring);
The \d, \w and similar in regex all have negative versions, which are simply the upper-case version of the same letter.
So \w matches any word character (ie basically alpha-numerics), and therefore \W matches anything except a word character, so anything other than an alpha-numeric.
This sounds like what you're after.
For more info, I recommend regular-expressions.info.
Since PHP 5.1.0 can use \p{L} (Unicode letters) and \p{N} (Unicode digits) that is unicode equivalent like \d and \w for latin
preg_replace("/[^\p{L}\p{N}]/iu", $replacement_string, $original_string);
/iu modifiers at the end of pattern:
i (PCRE_CASELESS)
u (PCRE_UTF8)
see more at: https://www.php.net/manual/en/reference.pcre.pattern.modifiers.php
Related
I know this should remove any characters from string and keep only numbers and ENGLISH letters.
$txtafter = preg_replace("/[^a-zA-Z 0-9]+/","",$txtbefore);
but I wish to remove any special characters and keep any letter of any language like Arabic or Japanese.
Probably this will work for you:
$repl = preg_replace('/[^\w\s]+/u','' ,$txtbefore);
This will remove all non-word and non-space characters from your text. /u flag is there for unicode support.
You can use the \p{L} pattern to match any letter and \p{N} to much any numeric character. Also you should use u modifier like this: /\p{L}+/u
Your final regex may look like: /[^\p{L}\p{N}]/u
Also be sure to check this question:
Regular expression \p{L} and \p{N}
Is there a concise way to express:
\w but without _
That is, "all characters included in \w, except _"
I'm asking this because I'm looking for the most concise way to express domain name validation. A domain name may include lowercase and uppercase letters, numbers, period signs and dashes, but no underscores. \w includes all of the above, plus an underscore. So, is there any way to "remove" an underscore from \w via regex syntax?
Edited: I'm asking about regex as used in PHP.
Thanks in advance!
the following character class (in Perl)
[^\W_]
\W is the same as [^\w]
You could use a negative lookahead: (?!_)\w
However, I think writing [a-zA-Z0-9.-] is more readable.
To be on the safe side, usually, we will use character class:
[a-zA-Z0-9.-]
The regex "fragment" above match English alphabet, and digits, plus period . and dash -. It should work even with the most basic regex support.
Shorter may be better, but only if you know exactly what it represents.
I don't know what language you are using. In a lot of engines, \w is equivalent to [a-zA-Z0-9_] (some requires "ASCII mode" for this). However, some engine have Unicode support for regex, and may extend \w to match Unicode characters.
If my understanding is right \w means [A-Za-z0-9_] period signs, dashes are not included.
info:
http://en.wikipedia.org/wiki/Regular_expression#POSIX_character_classes
so I guess what you want is [a-zA-Z0-9.-]
Some regex flavours have a negative lookbehind syntax you might use:
\w(?<!_)
I would start with [^_], and then think of what else characters I need to deny. If you need to filter a keyboard input, it's quite simple to enumerate all the unwanted characters.
You can write something like this:
\([^\w]|_)\u
If you use preg_filter with this string any character in \w (excluding _ underscore) will be filtered.
I want to replace all occurrences of a with 5. Here is the code that works well:
$content=preg_replace("/\ba\b/","5", $content);
unless I have words like zapłać where a is between non standard characters, or zmarła where there is a Unicode (or non-ASCII) letter followed by a at the end of word. Is there any easy way to fix it?
the problem is that the predefined character class \w is ASCII based and that does not change, when the u modifier is used. (See regular-expressions.info, preg is PCRE in the columns)
You can use lookbehind and lookahead to do it:
$content=preg_replace("/(?<!\p{L})a(?!\p{L})/","5",$content);
This will replace "a" if there is not a letter before and not a letter ahead.
\p{L}: any kind of letter from any language.
$content=preg_replace("/\ba\b/u","5",$content);
Im about to create a registration form for my website. I need to check the variable, and accept it only if contains letter, number, _ or -.
How can do it with regex? I used to work with them with preg_replace(), but i think this is not the case. Also, i know that the "ereg" function is dead. Any solutions?
this regex is pretty common these days.
if(preg_match('/^[a-z0-9\-\_]+$/i',$username))
{
// Ok
}
Use preg_match:
preg_match('/^[\w-]+$/D', $str)
Here \w describes letters, digits and the _, so [\w-]+ matches one or more letters, digits, _, and -. ^ and $ are so called anchors that denote the begin and end of the string respectively. The D modifier avoids that $ really matches the end of the string and is not followed by a line break.
Note that the letter and digits that are matched by \w depend on the current locale and might match other letter or digits than just [a-zA-Z0-9]. So if you just want these, use them explicitly. And if you want to allow more than these, you could also try character classes that are describes by Unicode character properties like \p{L} for all Unicode letters.
Try preg_match(). http://php.net/manual/en/function.preg-match.php
i have found this:
$text = preg_replace('/\W+/', '-', $text);
Anyone can tell me what exactly do that? There is no information about what '/\W+/' means..
Regards
Javi
\W means a non-alphanumeric character, so anything other than a-z, A-Z, 0-9, or underscore.
This is standard for regular expressions, nothing specific to Php.
Here's a great tool for testing regular expressions:
http://www.gskinner.com/RegExr/
If you put \W+ in the box at the top you'll see what kinds of things it matches.
PS: Here's another tool that's simpler and cleaner, though perhaps not as feature rich:
http://rubular.com/
It includes a handy quick-reference for regular expressions at the bottom.
Looks like it replaces anything that isn't a 'word character' (letter, digit, underscore) and makes them hyphens.
The preg family of functions uses Perl Compatible Regular Expressions, or PCRE. There's a nice cheat sheet for them here (PDF).
The \W means "any non word character", and the + would limit it to matches of one or more of the preceding character. "Word characters" are defined to be letters, digits and underscores, so \W would match characters that aren't one of those.
Your line of code would replace any occurrence of a set of characters that aren't word characters with a hyphen.
It's documented at http://es2.php.net/manual/en/regexp.reference.backslash.php (linked from the PCRE section of the PHP manual where preg_replace is explained).