I have been using the following regex to replace all punctuation in a string:
preg_replace('/[^\w\s]/', '', $tweet);
with \w being shorthand for [a-zA-Z0-9_] and \s is used to ommit spaces. I learned this wisdom here: Strip punctuation in an address field in PHP. But now, I need the regex to strip all characters except
a-z and A-Z
{ and }
So it should strip out all dots, commas, numbers etc. What is the correct regex for this?
preg_replace('/[^a-zA-Z{} ]/', '', $tweet);
Possibly faster variant as proposed by FakeRainBrigand in a comment, thanks:
preg_replace('/[^a-zA-Z{} ]+/', '', $tweet);
preg_replace('/[^a-z{}]/i', '', $tweet);
Related
i need to create a regular expresion that match word whitespace word, it can't start with whitespace neither has more than 1 whitespaces between word and word i have to allow on each word letters and accents, i'm using this pattern:
^([^\+\*\.\|\(\)\[\]\{\}\?\/\^\s\d\t\n\r<>ºª!#"·#~½%¬&=\'¿¡~´,;:_®¥§¹×£µ€¶«²¢³\$\-\\]+\s{0,1}?)*$/
Examples:
-Graça+whitespace+anotherWord -> match
-whitespace+Graça+whitespace+anotherWord -> don't match
-Graça+whitespace+whitespace+anotherword -> don't match
In general, it is a validation to allow firstname+whitespace+lastname with accents chars and a-z chars
and i have to exclude all specials chars like /*-+)(!/($=
You can try this pattern: ^[\x{0041}-\x{02B3}]+\s[\x{0041}-\x{02B3}]+.
Explanation: since you are using characters not matched by \w, you have to define your own range of word characters. \x{0041} is just a character with unicode index equal to 0041.
Demo
For just spaces, use str_replace:
$string = str_replace(' ', '', $string);
For all whitespace, use preg_replace:
$string = preg_replace('/\s+/', '', $string);
I'd like a regexp or other string which can replace everything except alphanumeric chars (a-z and 0-9) from a string. All things such as ,##$(#*810 should be stripped. Any ideas?
Edit: I now need this to strip everything but allow dots, so everything but a-z, 1-9, .. Ideas?
$string = preg_replace("/[^a-z0-9.]+/i", "", $string);
Matches one or more characters not a-z 0-9 [case-insensitive], or "." and replaces with ""
I like using [^[:alnum:]] for this, less room for error.
preg_replace('/[^[:alnum:]]/', '', "(ABC)-[123]"); // returns 'ABC123'
Try:
$string = preg_replace ('/[^a-z0-9]/i', '', $string);
/i stands for case insensitivity (if you need it, of course).
/[^a-z0-9.]/
should do the trick
This also works to replace anything not a digit, a word character, or a period with an underscore. Useful for filenames.
$clean = preg_replace('/[^\d\w.]+/', '_', $string);
The most common regex suggested for removing special characters seems to be this -
preg_replace( '/[^a-zA-Z0-9]/', '', $string );
The problem is that it also removes non-English characters.
Is there a regex that removes special characters on all languages? Or the only solution is to explicitly match each special character and remove them?
You can use instead:
preg_replace('/\P{Xan}+/u', '', $string );
\p{Xan} is all that is a number or a letter in any alphabet of the unicode table.
\P{Xan} is all that is not a number or a letter. It is a shortcut for [^\p{Xan}]
You can use:
$string = preg_replace( '/[^\p{L}\p{N}]+/u', '', $string );
What is a regular expression to filter any special characters?
I want to remove any characters except 0-9 a-z A-Z and standard universal alphabet (arabic).
For example remove these characters: `~!##$%^&*()_+=-\][{}|';lL:"/.,<>? and any others.
$result = preg_replace('~[^A-Za-z0-9]~', '', $text);
how about:
preg_replace('/[^\p{Alphabetic}\p{Arabic}\pN]*/u', '', $str);
Say,should keep the space between letters(a-z,case insensitive) and remove the space between non-letters?
This should work:
$trimmed = preg_replace('~([a-z0-9])\s+([a-z0-9])~i', '\1\2', $your_text);
This will strip any whitespace that is between two non-alpha characters:
preg_replace('/(?<![a-z])\s+(?![a-z])/i', '', $text);
This will strip any whitespace that has a non-alpha character on either side (big difference):
preg_replace('/(?<![a-z])\s+|\s+(?![a-z])/i', '', $text);
By using negative look-ahead and negative look-behind assertions, the beginning and end of the string are treated as non-alpha as well.