PHP - Remove excess Whitespace but not new lines

PHP - Remove excess Whitespace but not new lines - php

i was looking for a way to remove excess whitespaces from within a string (that is, if 2 or more spaces are next each other, leave only 1 and remove the others), i found this Remove excess whitespace from within a string and i wanted to use this solution:
$foo = preg_replace( '/\s+/', ' ', $foo );
but this removes new lines aswell, while i want to keep them.
Is there any way to keep newlines while removing excess whitespace?

http://www.php.net/manual/en/regexp.reference.escape.php
defines \h
any horizontal whitespace character (since PHP 5.2.4)
so probably you are looking for
$foo = preg_replace( '/\h+/', ' ', $foo );
example: http://ideone.com/NcOiKW

If some of your symbols were converted to � after preg_replace (for example, Cyrillic capital letter R / Р), use mb_ereg_replace instead of preg_replace:
$value = mb_ereg_replace('/\h+/', ' ', $value);

if you want to remove excess of only-spaces (not tabs, new-lines, etc) you could use HEX code to be more specific:
$text = preg_replace('/\x20+/', ' ', $text);

Related

Regex not working with white spaces [duplicate]

I know this comment PHP.net.
I would like to have a similar tool like tr for PHP such that I can run simply
tr -d " " ""
I run unsuccessfully the function php_strip_whitespace by
$tags_trimmed = php_strip_whitespace($tags);
I run the regex function also unsuccessfully
$tags_trimmed = preg_replace(" ", "", $tags);

To strip any whitespace, you can use a regular expression
$str=preg_replace('/\s+/', '', $str);
See also this answer for something which can handle whitespace in UTF-8 strings.

A regular expression does not account for UTF-8 characters by default. The \s meta-character only accounts for the original latin set. Therefore, the following command only removes tabs, spaces, carriage returns and new lines
// http://stackoverflow.com/a/1279798/54964
$str=preg_replace('/\s+/', '', $str);
With UTF-8 becoming mainstream this expression will more frequently fail/halt when it reaches the new utf-8 characters, leaving white spaces behind that the \s cannot account for.
To deal with the new types of white spaces introduced in unicode/utf-8, a more extensive string is required to match and removed modern white space.
Because regular expressions by default do not recognize multi-byte characters, only a delimited meta string can be used to identify them, to prevent the byte segments from being alters in other utf-8 characters (\x80 in the quad set could replace all \x80 sub-bytes in smart quotes)
$cleanedstr = preg_replace(
"/(\t|\n|\v|\f|\r| |\xC2\x85|\xc2\xa0|\xe1\xa0\x8e|\xe2\x80[\x80-\x8D]|\xe2\x80\xa8|\xe2\x80\xa9|\xe2\x80\xaF|\xe2\x81\x9f|\xe2\x81\xa0|\xe3\x80\x80|\xef\xbb\xbf)+/",
"_",
$str
);
This accounts for and removes tabs, newlines, vertical tabs, formfeeds, carriage returns, spaces, and additionally from here:
nextline, non-breaking spaces, mongolian vowel separator, [en quad, em quad, en space, em space, three-per-em space, four-per-em space, six-per-em space, figure space, punctuation space, thin space, hair space, zero width space, zero width non-joiner, zero width joiner], line separator, paragraph separator, narrow no-break space, medium mathematical space, word joiner, ideographical space, and the zero width non-breaking space.
Many of these wreak havoc in xml files when exported from automated tools or sites which foul up text searches, recognition, and can be pasted invisibly into PHP source code which causes the parser to jump to next command (paragraph and line separators) which causes lines of code to be skipped resulting in intermittent, unexplained errors that we have begun referring to as "textually transmitted diseases"
[Its not safe to copy and paste from the web anymore. Use a character scanner to protect your code. lol]

Sometimes you would need to delete consecutive white spaces. You can do it like this:
$str = "My name is";
$str = preg_replace('/\s\s+/', ' ', $str);
Output:
My name is

$string = str_replace(" ", "", $string);
I believe preg_replace would be looking for something like [:space:]

You can use trim function from php to trim both sides (left and right)
trim($yourinputdata," ");
Or
trim($yourinputdata);
You can also use
ltrim() - Removes whitespace or other predefined characters from the left side of a string
rtrim() - Removes whitespace or other predefined characters from the right side of a string
System: PHP 4,5,7
Docs: http://php.net/manual/en/function.trim.php

If you want to remove all whitespaces everywhere from $tags why not just:
str_replace(' ', '', $tags);
If you want to remove new lines and such that would require a bit more...

Any possible option is to use custom file wrapper for simulating variables as files. You can achieve it by using this:
1) First of all, register your wrapper (only once in file, use it like session_start()):
stream_wrapper_register('var', VarWrapper);
2) Then define your wrapper class (it is really fast written, not completely correct, but it works):
class VarWrapper {
protected $pos = 0;
protected $content;
public function stream_open($path, $mode, $options, &$opened_path) {
$varname = substr($path, 6);
global $$varname;
$this->content = $$varname;
return true;
}
public function stream_read($count) {
$s = substr($this->content, $this->pos, $count);
$this->pos += $count;
return $s;
}
public function stream_stat() {
$f = fopen(__file__, 'rb');
$a = fstat($f);
fclose($f);
if (isset($a[7])) $a[7] = strlen($this->content);
return $a;
}
}
3) Then use any file function with your wrapper on var:// protocol (you can use it for include, require etc. too):
global $__myVar;
$__myVar = 'Enter tags here';
$data = php_strip_whitespace('var://__myVar');
Note: Don't forget to have your variable in global scope (like global $__myVar)

This is an old post but the shortest answer is not listed here so I am adding it now
strtr($str,[' '=>'']);
Another common way to "skin this cat" would be to use explode and implode like this
implode('',explode(' ', $str));

You can do it by using ereg_replace
$str = 'This Is New Method Ever';
$newstr = ereg_replace([[:space:]])+', '', trim($str)):
echo $newstr
// Result - ThisIsNewMethodEver

you also use preg_replace_callback function . and this function is identical to its sibling preg_replace except for it can take a callback function which gives you more control on how you manipulate your output.
$str = "this is a string";
echo preg_replace_callback(
'/\s+/',
function ($matches) {
return "";
},
$str
);

$string = trim(preg_replace('/\s+/','',$string));

Is old post but can be done like this:
if(!function_exists('strim')) :
function strim($str,$charlist=" ",$option=0){
$return='';
if(is_string($str))
{
// Translate HTML entities
$return = str_replace(" "," ",$str);
$return = strtr($return, array_flip(get_html_translation_table(HTML_ENTITIES, ENT_QUOTES)));
// Choose trim option
switch($option)
{
// Strip whitespace (and other characters) from the begin and end of string
default:
case 0:
$return = trim($return,$charlist);
break;
// Strip whitespace (and other characters) from the begin of string
case 1:
$return = ltrim($return,$charlist);
break;
// Strip whitespace (and other characters) from the end of string
case 2:
$return = rtrim($return,$charlist);
break;
}
}
return $return;
}
endif;
Standard trim() functions can be a problematic when come HTML entities. That's why i wrote "Super Trim" function what is used to handle with this problem and also you can choose is trimming from the begin, end or booth side of string.

A simple way to remove spaces from the whole string is to use the explode function and print the whole string using a for loop.
$text = $_POST['string'];
$a=explode(" ", $text);
$count=count($a);
for($i=0;$i<$count; $i++){
echo $a[$i];
}

The \s regex argument is not compatible with UTF-8 multybyte strings.
This PHP RegEx is one I wrote to solve this using PCRE (Perl Compatible Regular Expressions) based arguments as a replacement for UTF-8 strings:
function remove_utf8_whitespace($string) {
return preg_replace('/\h+/u','',preg_replace('/\R+/u','',$string));
}
- Example Usage -
Before:
$string = " this is a test \n and another test\n\r\t ok! \n";
echo $string;
this is a test
and another test
ok!
echo strlen($string); // result: 43
After:
$string = remove_utf8_whitespace($string);
echo $string;
thisisatestandanothertestok!
echo strlen($string); // result: 28
PCRE Argument Listing
Source: https://www.rexegg.com/regex-quickstart.html
Character Legend Example Sample Match
\t Tab T\t\w{2} T ab
\r Carriage return character see below
\n Line feed character see below
\r\n Line separator on Windows AB\r\nCD AB
CD
\N Perl, PCRE (C, PHP, R…): one character that is not a line break \N+ ABC
\h Perl, PCRE (C, PHP, R…), Java: one horizontal whitespace character: tab or Unicode space separator
\H One character that is not a horizontal whitespace
\v .NET, JavaScript, Python, Ruby: vertical tab
\v Perl, PCRE (C, PHP, R…), Java: one vertical whitespace character: line feed, carriage return, vertical tab, form feed, paragraph or line separator
\V Perl, PCRE (C, PHP, R…), Java: any character that is not a vertical whitespace
\R Perl, PCRE (C, PHP, R…), Java: one line break (carriage return + line feed pair, and all the characters matched by \v)

There are some special types of whitespace in the form of tags.
You need to use
$str=strip_tags($str);
to remove redundant tags, error tags, to get to a normal string first.
And use
$str=preg_replace('/\s+/', '', $str);
It's work for me.

preg_replace vs trim PHP

I am working with a slug function and I dont fully understand some of it and was looking for some help on explaining.
My first question is about this line in my slug function $string = preg_replace('# +#', '-', $string); Now I understand that this replaces all spaces with a '-'. What I don't understand is what the + sign is in there for which comes after the white space in between the #.
Which leads to my next problem. I want a trim function that will get rid of spaces but only the spaces after they enter the value. For example someone accidentally entered "Arizona " with two spaces after the a and it destroyed the pages linked to Arizona.
So after all my rambling I basically want to figure out how I can use a trim to get rid of accidental spaces but still have the preg_replace insert '-' in between words.
ex.. "Sun City West " = "sun-city-west"
This is my full slug function-
function getSlug($string){
if(isset($string) && $string <> ""){
$string = strtolower($string);
//var_dump($string); echo "<br>";
$string = preg_replace('#[^\w ]+#', '', $string);
//var_dump($string); echo "<br>";
$string = preg_replace('# +#', '-', $string);
}
return $string;
}

You can try this:
function getSlug($string) {
return preg_replace('#\s+#', '-', trim($string));
}
It first trims extra spaces at the beginning and end of the string, and then replaces all the other with the - character.
Here your regex is:
#\s+#
which is:
# = regex delimiter
\s = any space character
+ = match the previous character or group one or more times
# = regex delimiter again
so the regex here means: "match any sequence of one or more whitespace character"

The + means at least one of the preceding character, so it matches one or more spaces. The # signs are one of the ways of marking the start and end of a regular expression's pattern block.
For a trim function, PHP handily provides trim() which removes all leading and trailing whitespace.

php removing excess whitespace

I'm trying to remove excess whitespace from a string like this:
hello world
to
hello world
Anyone has any idea how to do that in PHP?

With a regexp :
preg_replace('/( )+/', ' ', $string);
If you also want to remove every multi-white characters, you can use \s (\s is white characters)
preg_replace('/(\s)+/', ' ', $string);

$str = 'Why do I
have so much white space?';
$str = preg_replace('/\s{2,}/', ' ', $str);
var_dump($str); // string(34) "Why do I have so much white space?"
See it!
You could also use the + quantifier, because it always replaces it with a . However, I find {2,} to show your intent clearer.

There is an example on how to strip excess whitespace in the preg_replace documentation

Not a PHP expert, but his sounds like a job for REGEX....
<?php
$string = 'Hello World and Everybody!';
$pattern = '/\s+/g';
$replacement = ' ';
echo preg_replace($pattern, $replacement, $string);
?>
Again, PHP is not my language, but the idea is to replace multiple whitespaces with single spaces. The \s stands for white space, and the + means one or more. The g on the end means to do it globally (i.e. more than once).

How can strip whitespaces in PHP's variable?

I know this comment PHP.net.
I would like to have a similar tool like tr for PHP such that I can run simply
tr -d " " ""
I run unsuccessfully the function php_strip_whitespace by
$tags_trimmed = php_strip_whitespace($tags);
I run the regex function also unsuccessfully
$tags_trimmed = preg_replace(" ", "", $tags);

To strip any whitespace, you can use a regular expression
$str=preg_replace('/\s+/', '', $str);
See also this answer for something which can handle whitespace in UTF-8 strings.

A regular expression does not account for UTF-8 characters by default. The \s meta-character only accounts for the original latin set. Therefore, the following command only removes tabs, spaces, carriage returns and new lines
// http://stackoverflow.com/a/1279798/54964
$str=preg_replace('/\s+/', '', $str);
With UTF-8 becoming mainstream this expression will more frequently fail/halt when it reaches the new utf-8 characters, leaving white spaces behind that the \s cannot account for.
To deal with the new types of white spaces introduced in unicode/utf-8, a more extensive string is required to match and removed modern white space.
Because regular expressions by default do not recognize multi-byte characters, only a delimited meta string can be used to identify them, to prevent the byte segments from being alters in other utf-8 characters (\x80 in the quad set could replace all \x80 sub-bytes in smart quotes)
$cleanedstr = preg_replace(
"/(\t|\n|\v|\f|\r| |\xC2\x85|\xc2\xa0|\xe1\xa0\x8e|\xe2\x80[\x80-\x8D]|\xe2\x80\xa8|\xe2\x80\xa9|\xe2\x80\xaF|\xe2\x81\x9f|\xe2\x81\xa0|\xe3\x80\x80|\xef\xbb\xbf)+/",
"_",
$str
);
This accounts for and removes tabs, newlines, vertical tabs, formfeeds, carriage returns, spaces, and additionally from here:
nextline, non-breaking spaces, mongolian vowel separator, [en quad, em quad, en space, em space, three-per-em space, four-per-em space, six-per-em space, figure space, punctuation space, thin space, hair space, zero width space, zero width non-joiner, zero width joiner], line separator, paragraph separator, narrow no-break space, medium mathematical space, word joiner, ideographical space, and the zero width non-breaking space.
Many of these wreak havoc in xml files when exported from automated tools or sites which foul up text searches, recognition, and can be pasted invisibly into PHP source code which causes the parser to jump to next command (paragraph and line separators) which causes lines of code to be skipped resulting in intermittent, unexplained errors that we have begun referring to as "textually transmitted diseases"
[Its not safe to copy and paste from the web anymore. Use a character scanner to protect your code. lol]

Sometimes you would need to delete consecutive white spaces. You can do it like this:
$str = "My name is";
$str = preg_replace('/\s\s+/', ' ', $str);
Output:
My name is

$string = str_replace(" ", "", $string);
I believe preg_replace would be looking for something like [:space:]

You can use trim function from php to trim both sides (left and right)
trim($yourinputdata," ");
Or
trim($yourinputdata);
You can also use
ltrim() - Removes whitespace or other predefined characters from the left side of a string
rtrim() - Removes whitespace or other predefined characters from the right side of a string
System: PHP 4,5,7
Docs: http://php.net/manual/en/function.trim.php

If you want to remove all whitespaces everywhere from $tags why not just:
str_replace(' ', '', $tags);
If you want to remove new lines and such that would require a bit more...

Any possible option is to use custom file wrapper for simulating variables as files. You can achieve it by using this:
1) First of all, register your wrapper (only once in file, use it like session_start()):
stream_wrapper_register('var', VarWrapper);
2) Then define your wrapper class (it is really fast written, not completely correct, but it works):
class VarWrapper {
protected $pos = 0;
protected $content;
public function stream_open($path, $mode, $options, &$opened_path) {
$varname = substr($path, 6);
global $$varname;
$this->content = $$varname;
return true;
}
public function stream_read($count) {
$s = substr($this->content, $this->pos, $count);
$this->pos += $count;
return $s;
}
public function stream_stat() {
$f = fopen(__file__, 'rb');
$a = fstat($f);
fclose($f);
if (isset($a[7])) $a[7] = strlen($this->content);
return $a;
}
}
3) Then use any file function with your wrapper on var:// protocol (you can use it for include, require etc. too):
global $__myVar;
$__myVar = 'Enter tags here';
$data = php_strip_whitespace('var://__myVar');
Note: Don't forget to have your variable in global scope (like global $__myVar)

This is an old post but the shortest answer is not listed here so I am adding it now
strtr($str,[' '=>'']);
Another common way to "skin this cat" would be to use explode and implode like this
implode('',explode(' ', $str));

You can do it by using ereg_replace
$str = 'This Is New Method Ever';
$newstr = ereg_replace([[:space:]])+', '', trim($str)):
echo $newstr
// Result - ThisIsNewMethodEver

you also use preg_replace_callback function . and this function is identical to its sibling preg_replace except for it can take a callback function which gives you more control on how you manipulate your output.
$str = "this is a string";
echo preg_replace_callback(
'/\s+/',
function ($matches) {
return "";
},
$str
);

$string = trim(preg_replace('/\s+/','',$string));

Is old post but can be done like this:
if(!function_exists('strim')) :
function strim($str,$charlist=" ",$option=0){
$return='';
if(is_string($str))
{
// Translate HTML entities
$return = str_replace(" "," ",$str);
$return = strtr($return, array_flip(get_html_translation_table(HTML_ENTITIES, ENT_QUOTES)));
// Choose trim option
switch($option)
{
// Strip whitespace (and other characters) from the begin and end of string
default:
case 0:
$return = trim($return,$charlist);
break;
// Strip whitespace (and other characters) from the begin of string
case 1:
$return = ltrim($return,$charlist);
break;
// Strip whitespace (and other characters) from the end of string
case 2:
$return = rtrim($return,$charlist);
break;
}
}
return $return;
}
endif;
Standard trim() functions can be a problematic when come HTML entities. That's why i wrote "Super Trim" function what is used to handle with this problem and also you can choose is trimming from the begin, end or booth side of string.

A simple way to remove spaces from the whole string is to use the explode function and print the whole string using a for loop.
$text = $_POST['string'];
$a=explode(" ", $text);
$count=count($a);
for($i=0;$i<$count; $i++){
echo $a[$i];
}

The \s regex argument is not compatible with UTF-8 multybyte strings.
This PHP RegEx is one I wrote to solve this using PCRE (Perl Compatible Regular Expressions) based arguments as a replacement for UTF-8 strings:
function remove_utf8_whitespace($string) {
return preg_replace('/\h+/u','',preg_replace('/\R+/u','',$string));
}
- Example Usage -
Before:
$string = " this is a test \n and another test\n\r\t ok! \n";
echo $string;
this is a test
and another test
ok!
echo strlen($string); // result: 43
After:
$string = remove_utf8_whitespace($string);
echo $string;
thisisatestandanothertestok!
echo strlen($string); // result: 28
PCRE Argument Listing
Source: https://www.rexegg.com/regex-quickstart.html
Character Legend Example Sample Match
\t Tab T\t\w{2} T ab
\r Carriage return character see below
\n Line feed character see below
\r\n Line separator on Windows AB\r\nCD AB
CD
\N Perl, PCRE (C, PHP, R…): one character that is not a line break \N+ ABC
\h Perl, PCRE (C, PHP, R…), Java: one horizontal whitespace character: tab or Unicode space separator
\H One character that is not a horizontal whitespace
\v .NET, JavaScript, Python, Ruby: vertical tab
\v Perl, PCRE (C, PHP, R…), Java: one vertical whitespace character: line feed, carriage return, vertical tab, form feed, paragraph or line separator
\V Perl, PCRE (C, PHP, R…), Java: any character that is not a vertical whitespace
\R Perl, PCRE (C, PHP, R…), Java: one line break (carriage return + line feed pair, and all the characters matched by \v)

There are some special types of whitespace in the form of tags.
You need to use
$str=strip_tags($str);
to remove redundant tags, error tags, to get to a normal string first.
And use
$str=preg_replace('/\s+/', '', $str);
It's work for me.

How do I replace tabs with spaces within variables in PHP?

$data contains tabs, leading spaces and multiple spaces. I wish to replace all tabs with a space. Multiple spaces with one single space, and remove leading spaces.
In fact somthing that would turn this input data:
[ asdf asdf asdf asdf ]
Into output data:
[asdf asdf asdf asdf]
How do I do this?

Trim, replace tabs and extra spaces with single spaces:
$data = preg_replace('/[ ]{2,}|[\t]/', ' ', trim($data));

$data = trim(preg_replace('/\s+/g', '', $data));

Assuming the square brackets aren't part of the string and you're just using them for illustrative purposes, then:
$new_string = trim(preg_replace('!\s+!', ' ', $old_string));
You might be able to do that with a single regex but it'll be a fairly complicated regex. The above is much more straightforward.
Note: I'm also assuming you don't want to replace "AB\t\tCD" (\t is a tab) with "AB CD".

$data = trim($data);
That gets rid of your leading (and trailing) spaces.
$pattern = '/\s+/';
$data = preg_replace($pattern, ' ', $data);
That turns any collection of one or more spaces into just one space.
$data = str_replace("\t", " ", $data);
That gets rid of your tabs.

$new_data = preg_replace("/[\t\s]+/", " ", trim($data));

This answer takes the question completely literally: it is only concerned with spaces and tabs. Granted, the OP probably also wants to include other kinds of whitespace in what gets trimmed/compressed, but let's pretend he wants to preserve embedded CR and/or LF.
First, let's set up some constants. This will allow for both ease of understanding and maintainability, should modifications become necessary. I put in some extra spaces so that you can compare the similarities and differences more easily.
define( 'S', '[ \t]+' ); # Stuff you want to compress; in this case ONLY spaces/tabs
define( 'L', '/\A'.S.'/' ); # stuff on the Left edge will be trimmed
define( 'M', '/'.S.'/' ); # stuff in the Middle will be compressed
define( 'R', '/'.S.'\Z/' ); # stuff on the Right edge will be trimmed
define( 'T', ' ' ); # what we want the stuff compressed To
We are using \A and \Z escape characters to specify the beginning and end of the subject, instead of the typical ^ and $ which are line-oriented meta-characters. This is not so much because they are needed in this instance as much as "defensive" programming, should the value of S change to make them needed in the future.
Now for the secret sauce: we are going to take advantage of some special semantics of preg_replace, namely (emphasis added)
If there are fewer elements in the replacement array than in the pattern array, any extra patterns will be replaced by an empty string.
function trim_press( $data ){
return preg_replace( [ M, L, R ], [ T ], $data );
}
So instead of a pattern string and replacement string, we are using a pattern array and replacement array, which results in the extra patterns L and R being trimmed.

In case you need to remove too.
$data = trim(preg_replace('/\s+|nbsp;/g', '', $data));

After much frustration I found this to be the best solution, as it also removes non breaking spaces which can be two characters long:
$data = html_entity_decode(str_replace(' ',' ',htmlentities($data)));
$data = trim(preg_replace('/\h/', ' ', $data)); // replaces more space character types than \s
See billynoah

Just use this regex
$str = trim(preg_replace('/\s\s+/', ' ', $str));
it will replace all tabs and spaces by one space,
here sign + in regex means one or more times,
pattern means, that wherever there are two or more spaces, replace it by one space

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP - Remove excess Whitespace but not new lines - php

http://www.php.net/manual/en/regexp.reference.escape.php defines \h any horizontal whitespace character (since PHP 5.2.4) so probably you are looking for $foo = preg_replace( '/\h+/', ' ', $foo ); example: http://ideone.com/NcOiKW

If some of your symbols were converted to � after preg_replace (for example, Cyrillic capital letter R / Р), use mb_ereg_replace instead of preg_replace: $value = mb_ereg_replace('/\h+/', ' ', $value);

if you want to remove excess of only-spaces (not tabs, new-lines, etc) you could use HEX code to be more specific: $text = preg_replace('/\x20+/', ' ', $text);

Related

Regex not working with white spaces [duplicate]

preg_replace vs trim PHP

php removing excess whitespace

How can strip whitespaces in PHP's variable?

How do I replace tabs with spaces within variables in PHP?

Categories

Resources