I am looking to remove all alpha-numeric characters from a string and replace with a space (using PHP). The input is coming from a textarea that has data pasted into it from various places like word, excel, websites, emails etc.
I was using this regex
/[^a-zA-Z0-9\s]/
But I found that there are still Vertical Tabs (ascii #13). I want my end result to only include letters and numbers, no newline, tab, vertical tabs etc
Many thanks!
Vertical tabs are matched by the whitespace character (\s)
If you want to replace every non-alpha-numeric character with a space, use
preg_replace('/[^a-zA-Z0-9]/', ' ', $string)
If you want to replace every group (consecutive characters) of non-alnums with a single space, use
preg_replace('/[^a-zA-Z0-9]+/', ' ', $string)
try removing the \s
/[^a-zA-Z0-9]/
\s is probably used for vertical spaces.
So just remove that:
/[^a-zA-Z0-9]/
Related
What is the difference between space and whitespace in PHP?
I saw on different places that to strip out all the spaces use str_replace() and use preg_replace() for whitespace.
Here is a reference: https://stackoverflow.com/a/2109339/4003463
In the added context to your question, a space is ascii 32 (that is, what you get when you press the spacebar).
A whitespace is any character that renders as a space, that is:
A space character
A tab character
A carriage return character
A new line character
A vertical tab character
A form feed character
I manage to remove the spaces but I can't understand why it would remove my returns as well. I have a textarea in my form and I want to allow up to two returns maximum. Here is what I have been using so far.
$string = preg_replace('/\s\s+/', ' ', $string); // supposed to remove more than one consecutive space - but also deletes my returns ...
$string = preg_replace('/\n\n\n+/', '\n\n', $string); // using this one by itself does not do as expected and removes all returns ...
It seems first line already gets rid of more than one spaces AND all returns ... Which is strange. Not sure than I am doing it right ...
Because \s will also match newline characters. So i suggest you to use \h for matching any kind of horizontal spaces.
$string = preg_replace('/\h\h+/', ' ', $string);
\s match any white space character [\r\n\t\f ]
See the deifinition of \s.It includes \n.Use
\h matches any horizontal whitespace character (equal to [[:blank:]])
Use \h for horizontal whitespaces.
For those of you who will need it, that's how you remove two carriage returns from a textarea.
preg_replace('/\n\r(\n\r)+/', "\n\r", $str);
For the space issue, as it has been posted above, replace \s by \h
Was just wondering if there was a clean way to replace all space character variants (half space, full-width space, Chinese space, etc.) with just a standard space?
Bonus points for replacing multiple spaces in a row (like 3 half-width or zero-width spaces or some of each) with just a single normal space.
I'll go with the obvious regular expression.
preg_replace('~\s+~u', ' ', 'your input here');
See http://php.net/manual/en/function.preg-replace.php
Hello guys I currently have a problem with my preg_replace :
preg_replace('#[^a-zA-z\s]#', '', $string)
It keeps all alphabetic letters and white spaces but I want more than one white space to be reduced to only one. Any idea how this can be done ?
$output = preg_replace('!\s+!', ' ', $input);
From Regular Expression Basic Syntax Reference
\d, \w and \s
Shorthand character classes matching digits, word characters (letters, digits, and underscores), and whitespace (spaces, tabs, and line breaks). Can be used inside and outside character classes.
The character type \s stands for five different characters: horizontal tab (9), line feed (10), form feed (12), carriage return (13) and ordinary space (32). The following code will find every substring of $string which is composed entirely of \s. Only the first \s in the substring will be preserved. For example, if line feed, horizontal tab and ordinary space occur immediately after one another in a substring, line feed alone will remain after the replacement is done.
$string = preg_replace('#(\s)\s+#', '\1', $string);
preg_replace(array('#\s+#', '#[^a-zA-z\s]#'), array(' ', ''), $string);
Though it will replace all of whitespaces with spaces. If you want to replace consequent whitespaces (like two newlines with only one newline) - you should figure out logic for that, coz \s+ will match "\n \n \n" (5 whitespaces in a row).
try using trim instead
<?php
$something = " Error";
echo $something."\n";
echo "------"."\n";
echo trim($something);
?>
output
Error
------
Error
Question is old and miss some details. Let's assume OP wanted to reduce all consecutive horizontal whitespaces and replace by a space.
Exemple:
"\t\t \t \t" => " "
"\t\t \t\t" => "\t \t"
One possible solution would be simply to use the generic character type \h which stands for horizontal whitespace space:
preg_replace('/\h+/', ' ', $text)
I'm writing a WordPress plugin, and one of the features is removing duplicate whitespace.
My code looks like this:
return preg_replace('/\s\s+/u', ' ', $text, -1, $count);
I don't understand why I need the u
modifier. I've seen other plugins
that use preg_replace and don't
need to modify it for Unicode. I
believe I have a default installation
of WordPress .
Without the modifier, the code
replaces all the spaces with Unicode
replacement glyphs instead of spaces.
With the u modifier, I don't get
the glyphs, and it doesn't replace all the whitespace.
Each space below has from 1-10 spaces. The regex only removes on space from each group.
Before:
This sentence has extra space. This doesn’t. Extra space, Lots of extra space.
After:
This sentence has extra space. This doesn’t. Extra space, Lots of extra space.
$count = 9
How can I make the regex replace the whole match with the one space?
Update: If I try this with regular php, it works fine
$new_text = preg_replace('/\s\s+/', ' ', $text, -1, $count);
It only breaks when I use it within the wordpress plugin.
I'm using this function in a filter:
function jje_test( $text ) {
$new_text = preg_replace('/\s\s+/', ' ', $text, -1, $count);
echo "Count: $count";
return $new_text;
}
add_filter('the_content', 'jje_test');
I have tried:
Removing all other filters on the_content
remove_all_filters('the_content');
Changing the priority of the filter added to the_content, earlier or later
All kinds of permutations of \s+, \s\s+, [ ]+ etc.
Even replacing all single spaces with an empty string, will not replace the spaces
This will replace all sequences of two or more spaces, tabs, and/or line breaks with a single space:
return preg_replace('/[\p{Z}\s]{2,}/u', ' ', $text);
You need the /u flag if $text holds text encoded as UTF-8. Even if there are no Unicode characters in your regex, PCRE has to interpret $text correctly.
I added \p{Z} to the character class because PCRE only matches ASCII characters when using shorthands such as \s, even when using /u. Adding \p{Z} makes sure all Unicode whitespace is matched. There might be other spaces such as non-breaking spaces in your string.
I'm not sure if using echo in a WordPress filter is a good idea.
The u modifier simply puts it into UTF-8 mode, which is useful if you need to do anything specific with characters that have a code point above 0x7f. You can still work on UTF-8 encoded strings without using that modifier, you just won't be able to specifically match or transform such characters easily.
There are some whitespace characters in Unicode that are above 0x7f. It's pretty rare to encounter them in most data. But you may see, for example, a non-breaking space character, which is unicode \uA0, or some rarer characters.
I don't know why using it would cause Unicode "replacement" glyphs to be output. I'd say it would be a problem elsewhere... what character encoding are you outputting your script as?
To answer jjeaton's follow-up question in the comments to my first reply, the following replaces each sequence of spaces, tabs, and/or line breaks with the first character in that sequence. Effectively, this deletes the second and following whitespace characters in each sequence of two or more whitespace characters. A run of spaces is replaced with a single space, a run of tabs is replaced with a single tab, etc. A run of a space and a tab (in that order) is replaced with a space, and a run of a tab and a space is replaced with a tab, etc.
return preg_replace('/([\p{Z}\s])[\p{Z}\s]+/u', '$1', $text);
This regex works by first matching one space and capturing it with a capturing group, followed by one or more spaces. The replacement text is simply reinserts the text matched byt the first (and only) capturing group.
Don't know about any modifiers, but this did the trick:
<?php
$text = ' Hi, my name is Andrés. ';
echo preg_replace(array('/^\s+/', '/\s+$/', '/\s{2,}/'), ' ', $text);
/*
Hi, my name is Andrés.
*/
?>
preg_replace('!\s+!', ' ', 'This sentence has extra space. This doesn’t. Extra space, Lots of extra space.');