Remove All these characters [duplicate] - php

How can I use PHP to strip out all characters that are NOT letters, numbers, spaces, or punctuation marks?
I've tried the following, but it strips punctuation.
preg_replace("/[^a-zA-Z0-9\s]/", "", $str);

preg_replace("/[^a-zA-Z0-9\s\p{P}]/", "", $str);
Example:
php > echo preg_replace("/[^a-zA-Z0-9\s\p{P}]/", "", "⟺f✆oo☃. ba⟗r!");
foo. bar!
\p{P} matches all Unicode punctuation characters (see Unicode character properties). If you only want to allow specific punctuation, simply add them to the negated character class. E.g:
preg_replace("/[^a-zA-Z0-9\s.?!]/", "", $str);

You're going to have to list the punctuation explicitly as there is no shorthand for that (eg \s is shorthand for white space characters).
preg_replace('/[^a-zA-Z0-9\s\-=+\|!##$%^&*()`~\[\]{};:\'",<.>\/?]/', '', $str);

$str = trim($str);
$str = trim($str, "\x00..\x1F");
$str = str_replace(array( ""","'","&","<",">"),' ',$str);
$str = preg_replace('/[^0-9a-zA-Z-]/', ' ', $str);
$str = preg_replace('/\s\s+/', ' ', $str);
$str = trim($str);
$str = preg_replace('/[ ]/', '-', $str);
Hope this helps.

Let's build a multibyte-safe/unicode-safe pattern for this task.
From https://www.regular-expressions.info/unicode.html:
\p{L} or \p{Letter}: any kind of letter from any language.
\p{Z} or \p{Separator}: any kind of whitespace or invisible separator.
\p{N} or \p{Number}: any kind of numeric character in any script.
\p{P} or \p{Punctuation}: any kind of punctuation character.
[^ ... ] is a negated character class that matches any character not in the list.
+ is a "one or more" quantifier.
u This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern and subject strings are treated as UTF-8. An invalid subject will cause the preg_* function to match nothing; an invalid pattern will trigger an error of level E_WARNING. Five and six octet UTF-8 sequences are regarded as invalid.
Code: (Demo)
echo preg_replace('/[^\p{L}\p{Z}\p{N}\p{P}]+/u', '', $string);

Related

PHP rtrim all trailing special characters

I'm making a function that that detect and remove all trailing special characters from string. It can convert strings like :
"hello-world"
"hello-world/"
"hello-world--"
"hello-world/%--+..."
into "hello-world".
anyone knows the trick without writing a lot of codes?
Just for fun
[^a-z\s]+
Regex demo
Explanation:
[^x]: One character that is not x sample
\s: "whitespace character": space, tab, newline, carriage return, vertical tab sample
+: One or more sample
PHP:
$re = "/[^a-z\\s]+/i";
$str = "Hello world\nhello world/\nhello world--\nhellow world/%--+...";
$subst = "";
$result = preg_replace($re, $subst, $str);
try this
$string = preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
or escape apostraphe from string
preg_replace('/[^A-Za-z0-9\-\']/', '', $string); // escape apostraphe
You could use a regex like this, depending on your definition of "special characters":
function clean_string($input) {
return preg_replace('/\W+$/', '', $input);
}
It replaces any characters that are not a word character (\W) at the end of the string $ with nothing. \W will match [^a-zA-Z0-9_], so anything that is not a letter, digit, or underscore will get replaced. To specify which characters are special chars, use a regex like this, where you put all your special chars within the [] brackets:
function clean_string($input) {
return preg_replace('/[\/%.+-]+$/', '', $input);
}
This one is what you are looking for. :
([^\n\w\d \"]*)$
It removes anything that is not from the alphabet, a number, a space and a new line.
Just call it like this :
preg_replace('/([^\n\w\s]*)$/', '', $string);

php regex remove all non-alphanumeric and space characters from a string

I need a regex to remove all non-alphanumeric and space characters, I have this
$page_title = preg_replace("/[^A-Za-z0-9 ]/", "", $page_title);
but it doesn't remove space characters and replaces some non-alphanumeric characters with numbers.
I need the special characters like puntuation and spaces removed.
If all you want to leave all of the alphanumeric bits you would use this:
(\W)+
Here is some test code:
$original = "Match spaces and {!}#";
echo $original ."<br>";
$altered = preg_replace("/(\W)+/", "", $original);
echo $altered;
Here is the output:
Match spaces and {!}#
Matchspacesand
Here is the explanation:
1st Capturing group: (\W) matches any non-word character [^a-zA-Z0-9_]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
I need the special characters like puntuation and spaces removed.
Then use:
$page_title = preg_replace('/[\p{P}\p{Zs}]+/u', "", $page_title);
\p{P} matches any punctuation character
\p{Zs} matches any space character
/u - To support unicode
Try this
preg_replace('/[^[:alnum:]]/', '', $page_title);
[:alnum:] matches alphanumeric characters
Works good for me on Sublime and PHP Regex Tester
$page_title = preg_replace("/[^A-Za-z0-9]/", "", $page_title);

PHP regex remove non alphanumeric except period

I'm having trouble finding a solution to this. How can I avoid losing the period in this regex?
$text = preg_replace('~[^\\pL\d]+~u', '-', $text);
$text = preg_replace('#[^0-9a-z\.]+#i', '-', $text);
This replaces anything that isn't 0-9, a-z, or a period, in a case-insenstive manner.
Just add the dot to your character class:
$text = preg_replace('~[^\\pL\d.]+~u', '-', $text);
You are using a negated character class (the [^ part) so anything that does not match any of the characters in that character class, gets replaced.
By the way, your question title does not match your regex.
What the heck is "\\pL"? AFAIK this matches a Backslash and the letters p and L.
Is this what you mean?
<?php
echo preg_replace('/[^a-z0-9.]+/ui', '-', 'abc093.-23.-2ªıØẞÆ.23.OAIFJ→øæł¶iwoeweo');
?>
Result:
abc093.-23.-2-.23.OAIFJ-iwoeweo
Don't do a double escape and to be fully unicode compatible, numerics are : \pN then:
$text = preg_replace('~[^\pL\pN]+~u', '-', $text);

Replace symbol if it is preceded and followed by a word character

I want to change a specific character, only if it's previous and following character is of English characters. In other words, the target character is part of the word and not a start or end character.
For Example...
$string = "I am learn*ing *PHP today*";
I want this string to be converted as following.
$newString = "I am learn'ing *PHP today*";
$string = "I am learn*ing *PHP today*";
$newString = preg_replace('/(\w)\*(\w)/', '$1\'$2', $string);
// $newString = "I am learn'ing *PHP today* "
This will match an asterisk surrounded by word characters (letters, digits, underscores). If you only want to do alphabet characters you can do:
preg_replace('/([a-zA-Z])\*([a-zA-Z])/', '$1\'$2', 'I am learn*ing *PHP today*');
The most concise way would be to use "word boundary" characters in your pattern -- they represent a zero-width position between a "word" character and a "non-word" characters. Since * is a non-word character, the word boundaries require the both neighboring characters to be word characters.
No capture groups, no references.
Code: (Demo)
$string = "I am learn*ing *PHP today*";
echo preg_replace('~\b\*\b~', "'", $string);
Output:
I am learn'ing *PHP today*
To replace only alphabetical characters, you need to use a [a-z] as a character range, and use the i flag to make the regex case-insensitive. Since the character you want to replace is an asterisk, you also need to escape it with a backslash, because an asterisk means "match zero or more times" in a regular expression.
$newstring = preg_replace('/([a-z])\*([a-z])/i', "$1'$2", $string);
To replace all occurances of asteric surrounded by letter....
$string = preg_replace('/(\w)*(\w)/', '$1\'$2', $string);
AND
To replace all occurances of asteric where asteric is start and end character of the word....
$string = preg_replace('/*(\w+)*/','\'$1\'', $string);

How to remove all non-alphanumeric and non-space characters from a string in PHP?

I want to remove all non-alphanumeric and space characters from a string. So I do want spaces to remain. What do I put for a space in the below function within the [ ] brackets:
ereg_replace("[^A-Za-z0-9]", "", $title);
In other words, what symbol represents space, I know \n represents a new line, is there any such symbol for a single space.
Just put a plain space into your character class:
[^A-Za-z0-9 ]
For other whitespace characters (tabulator, line breaks, etc.) use \s instead.
You should also be aware that the PHP’s POSIX ERE regular expression functions are deprecated and will be removed in PHP 6 in favor of the PCRE regular expression functions. So I recommend you to use preg_replace instead:
preg_replace("/[^A-Za-z0-9 ]/", "", $title)
If you want only a literal space, put one in. the group for 'whitespace characters' like tab and newlines is \s
The accepted answer does not remove spaces.
Consider the following
$string = 'tD 13827$2099';
$string = preg_replace("/[^A-Za-z0-9 ]/", "", $string);
echo $string;
> tD 138272099
Now if we str_replace spaces, we get the desired output
$string = 'tD 13827$2099';
$string = preg_replace("/[^A-Za-z0-9 ]/", "", $string);
// remove the spaces
$string = str_replace(" ", "", $string);
echo $string;
> tD138272099

Categories