php regex remove all non-alphanumeric and space characters from a string

php regex remove all non-alphanumeric and space characters from a string - php

I need a regex to remove all non-alphanumeric and space characters, I have this
$page_title = preg_replace("/[^A-Za-z0-9 ]/", "", $page_title);
but it doesn't remove space characters and replaces some non-alphanumeric characters with numbers.
I need the special characters like puntuation and spaces removed.

If all you want to leave all of the alphanumeric bits you would use this:
(\W)+
Here is some test code:
$original = "Match spaces and {!}#";
echo $original ."<br>";
$altered = preg_replace("/(\W)+/", "", $original);
echo $altered;
Here is the output:
Match spaces and {!}#
Matchspacesand
Here is the explanation:
1st Capturing group: (\W) matches any non-word character [^a-zA-Z0-9_]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]

I need the special characters like puntuation and spaces removed.
Then use:
$page_title = preg_replace('/[\p{P}\p{Zs}]+/u', "", $page_title);
\p{P} matches any punctuation character
\p{Zs} matches any space character
/u - To support unicode

Try this
preg_replace('/[^[:alnum:]]/', '', $page_title);
[:alnum:] matches alphanumeric characters

Works good for me on Sublime and PHP Regex Tester
$page_title = preg_replace("/[^A-Za-z0-9]/", "", $page_title);

Related

PHP - Remove all punctuation from the start and end of the string

I would like to trim all the punctuation and leave only letters or numbers at the beginning and at the end of the string. Any punctuation between letters and numbers should be retained.
This is what I tried from here PHP preg_replace: remove punctuation from beginning and end of string:
$str = '£££2343423 34234238& ';
$new = preg_replace('/^\PL+|\PL\z/', '', $str);
echo $new;
Kindly any recommendations, please?

You can use
$new = preg_replace('/^[^\p{L}0-9]+|[^\p{L}0-9]+\z/u', '', $str);
The regex matches
^[^\p{L}0-9]+ - any one or more chars other than Unicode letters and ASCII digits at the start of string
| - or
[^\p{L}0-9]+\z - any one or more chars other than Unicode letters and ASCII digits at the end of string.
See the PHP demo online and a regex demo.

Preg replace utf8 charset issue with à

I'm trying to add a special string '|||' after newlines, blankspaces and other characters. I'm doing this because I want to split my text into an array. So I was thinking to do it like this:
$result = preg_replace("/<br>/", "<br>|||", preg_replace("/\s/", " |||", preg_replace("/\r/", "\r|||", preg_replace("/\n/", "\n|||", preg_replace("/’/", "’|||", preg_replace("/'/", "'|||", $text))))));
$result = preg_split("/[|||]+/", $result);
It works with every word but words which contain à char. It is replaced by �.
I'm sure the problem is here because my string $text shows the char à.

Since your pattern deals with a Unicode string, pass the /u modifier.
Also, you do not need so many chained regex replacements, group the first patterns and use a backreference in the replacement.
Use
preg_replace("/(<br>|[\s’'])/u", "$1|||", $text)
Note that \s matches spaces, carriage returns and newlines.
Details:
(<br>|[\s’']) - Group 1 capturing either a
<br> - character sequence
| - or
[\s’'] - a whitespace, ’ or '.
See the PHP demo:
$text = "Voilà. C'est vrai.";
echo preg_replace("/(<br>|[\s’'])/u", "$1|||", $text);

How to replace non-ASCII characters in a string in PHP?

I need to replace characters in a string which are not represented with a single byte.
My string is like this
$inputText="centralkøkkenet kliniske diætister";
In that string there are characters like ø and æ. These characters should be replaced. How do I mention these in a regular expression that I can use for replacement?

If you want to replace everything other than alphanumeric and space character then try it.
[^a-zA-Z0-9 ]
Here is demo
Sample code:
$re = "/[^a-zA-Z0-9 ]/";
$str = "centralkøkkenet kliniske diætister";
$subst = '';
$result = preg_replace($re, $subst, $str);
Better use [^\w\s] or [\W\S] to make it short and simple as suggested by #hjpotter92 as well in comments.
Pattern explanation:
[^\w\s] any character except: word characters:
(a-z, A-Z, 0-9, _), whitespace (\n, \r, \t,\f, and " ")
[\W\S] any character of:
non-word characters (all but a-z, A-Z, 0-9, _),
non-whitespace (all but \n, \r, \t, \f, and " ")

If you want to keep also punctation ie.: -'"!..., use this one:
$text = 'central-køkkenet "kliniske" diætister!';
$new = preg_replace('/[\x7F-\xFF]/ui', '', $text);
echo $new,"\n";
output:
central-kkkenet "kliniske" ditister!

PHP preg_replace special characters

I am wanting to replace all non letter and number characters i.e. /&%#$ etc with an underscore (_) and replace all ' (single quotes) with ""blank (so no underscore).
So "There wouldn't be any" (ignore the double quotes) would become "There_wouldnt_be_any".
I am useless at reg expressions hence the post.
Cheers

If you by writing "non letters and numbers" exclude more than [A-Za-z0-9] (ie. considering letters like åäö to be letters to) and want to be able to accurately handle UTF-8 strings \p{L} and \p{N} will be of aid.
\p{N} will match any "Number"
\p{L} will match any "Letter Character", which includes
Lower case letter
Modifier letter
Other letter
Title case letter
Upper case letter
Documentation PHP: Unicode Character Properties
$data = "Thäre!wouldn't%bé#äny";
$new_data = str_replace ("'", "", $data);
$new_data = preg_replace ('/[^\p{L}\p{N}]/u', '_', $new_data);
var_dump (
$new_data
);
output
string(23) "Thäre_wouldnt_bé_äny"

$newstr = preg_replace('/[^a-zA-Z0-9\']/', '_', "There wouldn't be any");
$newstr = str_replace("'", '', $newstr);
I put them on two separate lines to make the code a little more clear.
Note: If you're looking for Unicode support, see Filip's answer below. It will match all characters that register as letters in addition to A-z.

do this in two steps:
replace not letter characters with this regex:
[\/\&%#\$]
replace quotes with this regex:
[\"\']
and use preg_replace:
$stringWithoutNonLetterCharacters = preg_replace("/[\/\&%#\$]/", "_", $yourString);
$stringWithQuotesReplacedWithSpaces = preg_replace("/[\"\']/", " ", $stringWithoutNonLetterCharacters);

Replace symbol if it is preceded and followed by a word character

I want to change a specific character, only if it's previous and following character is of English characters. In other words, the target character is part of the word and not a start or end character.
For Example...
$string = "I am learn*ing *PHP today*";
I want this string to be converted as following.
$newString = "I am learn'ing *PHP today*";

$string = "I am learn*ing *PHP today*";
$newString = preg_replace('/(\w)\*(\w)/', '$1\'$2', $string);
// $newString = "I am learn'ing *PHP today* "
This will match an asterisk surrounded by word characters (letters, digits, underscores). If you only want to do alphabet characters you can do:
preg_replace('/([a-zA-Z])\*([a-zA-Z])/', '$1\'$2', 'I am learn*ing *PHP today*');

The most concise way would be to use "word boundary" characters in your pattern -- they represent a zero-width position between a "word" character and a "non-word" characters. Since * is a non-word character, the word boundaries require the both neighboring characters to be word characters.
No capture groups, no references.
Code: (Demo)
$string = "I am learn*ing *PHP today*";
echo preg_replace('~\b\*\b~', "'", $string);
Output:
I am learn'ing *PHP today*

To replace only alphabetical characters, you need to use a [a-z] as a character range, and use the i flag to make the regex case-insensitive. Since the character you want to replace is an asterisk, you also need to escape it with a backslash, because an asterisk means "match zero or more times" in a regular expression.
$newstring = preg_replace('/([a-z])\*([a-z])/i', "$1'$2", $string);

To replace all occurances of asteric surrounded by letter....
$string = preg_replace('/(\w)*(\w)/', '$1\'$2', $string);
AND
To replace all occurances of asteric where asteric is start and end character of the word....
$string = preg_replace('/*(\w+)*/','\'$1\'', $string);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

php regex remove all non-alphanumeric and space characters from a string - php

I need the special characters like puntuation and spaces removed. Then use: $page_title = preg_replace('/[\p{P}\p{Zs}]+/u', "", $page_title); \p{P} matches any punctuation character \p{Zs} matches any space character /u - To support unicode

Try this preg_replace('/[^[:alnum:]]/', '', $page_title); [:alnum:] matches alphanumeric characters

Works good for me on Sublime and PHP Regex Tester $page_title = preg_replace("/[^A-Za-z0-9]/", "", $page_title);

Related

PHP - Remove all punctuation from the start and end of the string

Preg replace utf8 charset issue with à

How to replace non-ASCII characters in a string in PHP?

PHP preg_replace special characters

Replace symbol if it is preceded and followed by a word character

Categories

Resources