PHP: Count number of spaces in a multiple space span - php

I'm scanning a form field entry ($text) for spaces and replacing the spaces with a blank spot using preg_replace.
$text=preg_replace('/\s/',' ',$text);
This works great, except when there are multiple consecutive spaces in a line. They are all treated as a blank.
I can use this if I know the amount of spaces there will be:
$text=preg_replace('/ {2,}/','**' ,$text);
However I will never be sure of how many spaces the input could be.
Sample Input 1: This is a test.
Sample Input 2: This is a test.
Sample Input 3: This is a test.
Using both preg_replace statements above I get:
Sample Output 1: This is a test.
Sample Output 2: This**is a test.
Sample Output 3: This**is a test.
How would I go about scanning the input for consecutive spaces, counting them and setting that count to a variable to place inside the preg_replace statement for multiple spaces?
Or is there another way of doing this that I am clearly missing?
*Note: Using for the replacement works to maintain the extra spaces, but I cannot replace the space with . When I do it breaks the word-wrap in my output and breaks the words wherever the wrap happens as the string never ends and it will just wrap whenever instead of before or after a word.

if you want replace multiple space with single space you could use
$my_result = preg_replace('!\s+!', ' ', $text);

You can use an alternation with two lookarounds to check if there's a whitespace before or after:
$text = preg_replace('~\s(?:(?=\s)|(?<=\s\s))~', '*', $text);
demo
details:
\s # a whitespace
(?:
(?=\s) # followed by 1 whitespace
| # OR
(?<=\s\s) # preceded by 2 whitespaces (including the previous)
)

Use preg_replace_callback to count the found spaces.
$text = 'This is a test.';
print preg_replace_callback('/ {1,}/',function($a){
return str_repeat('*',strlen($a[0]));
},$text);
Result: This**is*a*test.

Related

Remove more than two returns and more than one space within php string

I manage to remove the spaces but I can't understand why it would remove my returns as well. I have a textarea in my form and I want to allow up to two returns maximum. Here is what I have been using so far.
$string = preg_replace('/\s\s+/', ' ', $string); // supposed to remove more than one consecutive space - but also deletes my returns ...
$string = preg_replace('/\n\n\n+/', '\n\n', $string); // using this one by itself does not do as expected and removes all returns ...
It seems first line already gets rid of more than one spaces AND all returns ... Which is strange. Not sure than I am doing it right ...
Because \s will also match newline characters. So i suggest you to use \h for matching any kind of horizontal spaces.
$string = preg_replace('/\h\h+/', ' ', $string);
\s match any white space character [\r\n\t\f ]
See the deifinition of \s.It includes \n.Use
\h matches any horizontal whitespace character (equal to [[:blank:]])
Use \h for horizontal whitespaces.
For those of you who will need it, that's how you remove two carriage returns from a textarea.
preg_replace('/\n\r(\n\r)+/', "\n\r", $str);
For the space issue, as it has been posted above, replace \s by \h

Regex to remove ALL single characters from a string

I need a Regular Expression to remove ALL single characters from a string, not just single letters or numbers
The string is:
"A Future Ft Casino Karate Chop ( Prod By Metro )"
it should come out as:
"Future Ft Casino Karate Chop Prod By Metro"
The expression I am using at the moment (in PHP), correctly removes the single 'A' but leaves the single '(' and ')'
This is the code I am using:
$string = preg_replace('/\b\w\b\s?/', '', $string);
Try this:
(^| ).( |$)
Breakdown:
1. (^| ) -> Beginning of line or space
2. . -> Any character
3. ( |$) -> Space or End of line
Actual code:
$string = preg_replace('/(^| ).( |$)/', '$1', $string);
Note: I'm not familiar with the workings of PHP regex, so the code might need a slight tweak depending on how the actual regex needs declared.
As m.buettner pointed out, there will be a trailing white space here with this code. A trim would be needed to clear it out.
Edit: Arnis Juraga pointed out that this would not clear out multiple single characters a b c would filter out to b. If this is an issues use this regex:
(^| ).(( ).)*( |$)
The (( ).)* added to the middle will look for any space following by any character 0 or more times. The downside is this will end up with double spaces where a series of single characters were located.
Meaning this:
The a b c dog
Will become this:
The dog
After performing the replacement to get single individual characters, you would need to use the following regex to locate the double spaces, then replace with a single space
( ){2}
A slightly more efficient version that does not require capturing would be using lookarounds. It's a bit less intuitive due to the multiple negative logic:
$string = preg_replace('/(?<!\S).(?!\S)\s*/', '', $input);
This will remove any character that is neither preceded nor followed by a non-whitespace character (so only those that are between whitespace or at the string boundaries). It will also include all trailing whitespace in the match, so as to leave only the preceding whitespace if there is any. The caveat is, that just like Nick's answer the ) at the end of the string will leave a trailing whitespace (because it is in front of the character). This can easily be solved by trimming the string.

Regex: remove non-alphanumeric chars, multiple whitespaces and trim() all together

I have a $text to strip off all non-alphanumeric chars, replace multiple white spaces and newline by single space and eliminate beginning and ending space.
This is my solution so far.
$text = '
some- text!!
for testing?
'; // $text to format
//strip off all non-alphanumeric chars
$text = preg_replace("/[^a-zA-Z0-9\s]/", "", $text);
//Replace multiple white spaces by single space
$text = preg_replace('/\s+/', ' ', $text);
//eliminate beginning and ending space
$finalText = trim($text);
/* result: $finalText ="some text for testing";
without non-alphanumeric chars, newline, extra spaces and trim()med */
Is it possible to combine/achieve all these in one regular expression? as I would get the desired result in one line as below
$finalText = preg_replace(some_reg_expression, $replaceby, $text);
thanks
Edit: clarified with a test string
Of course you can. That is very easy.
The re will look like:
((?<= )\s*)|[^a-zA-Z0-9\s]|(\s*$)|(^\s*)
I have no PHP at hand, I have used Perl (just to test the re and show that it works) (you can play with my code here):
$ cat test.txt
a b c d
a b c e f g fff f
$ cat 1.pl
while(<>) {
s/((?<= )\s*)|[^a-zA-Z0-9\s]|(\s*$)|(^\s*)//g;
print $_,"\n";
}
$ cat test.txt | perl 1.pl
a b c d
a b c e f g fff f
For PHP it will be the same.
What does the RE?
((?<= )\s*) # all spaces that have at least one space before them
|
[^a-zA-Z0-9\s] # all non-alphanumeric characters
|
(\s*$) # all spaces at the end of string
|
(^\s*) # all spaces at the beginning of string
The only tricky part here is ((?<= )\s*), lookbehind assertion. You remove spaces if and only if the substring of spaces has a space before.
When you want to know how lookahead/lookbehind assertions work, please take a look at http://www.regular-expressions.info/lookaround.html.
Update from the discussion:
What happens when $text ='some ? ! ? text';?
Then the resulting string contains multiple spaces between "some" and "text".
It is not so easy to solve the problem, because one need positive lookbehind assertions with variable length, and that is not possible at the moment. One cannot simple check spaces because it can happen so that it is not a space but non-alphanumerich character and it will be removed anyway (for example: in " !" the "!" sign will be removed but RE knows nothing about; one need something like (?<=[^a-zA-Z0-9\s]* )\s* but that unfortunately will not work because PCRE does not support lookbehind variable length assertions.
I do not think that you can achieve that with one regex. You would basically need to stick in an if else condition, which it is not possible through Regular Expressions alone.
You would basically need one regex to remove non-alphanumeric digits and another one to collapse the spaces, which is basically what you are already doing.
Check this if this is what you are looking for ---
$patterns = array ('/[^a-zA-Z0-9\s]/','/\s+/');
$replace = array ("", ' ');
trim( preg_replace($patterns, $replace, $text) );
MAy be it may need some modification, just let me know if this is something what you want to do??
For your own sanity, you will want to keep regular expressions that you can still understand and edit later on :)
$text = preg_replace(array(
"/[^a-zA-Z0-9\s]/", // remove all non-space, non-alphanumeric characters
'/\s{2,}/', // replace multiple white space occurrences with single
), array(
'',
' ',
), trim($originalText));
$text =~ s/([^a-zA-Z0-9\s].*?)//g;
Doesn't have to be any harder than this.

regex: delete white characters

I try to delete more then one white characters from my string:
$content = preg_replace('/\s+/', " ", $content); //in some cases it doesn't work
but when i wrote
$content = preg_replace('/\s\s+/', " ", $content); //works fine
could somebody explain why?
because when i write /\s+/ it must match all with one or more white character, why it doesn't work?
Thanks
What is the minimum number of whitespace characters you want to match?
\s+ is equivalent to \s\s* -- one mandatory whitespace character followed by any number more of them.
\s\s+ is equivalent to \s\s\s* -- two mandatory whitespace characters followed by any number more (if this is what you want, it might be clearer as \s{2,}).
Also note that $content = preg_replace('/\s+/', " ", $content); will replace any single spaces in $content with a single space. In other words, if your string only contains single spaces, the result will be no change.
I just wanted to add to that the reason why your /s+/ worked sometimes and not others, is that regular expressions are very greedy, so it is going to try to match one or more space characters, and as many as it can match. I think that is where you got tripped up in finding a solution.
Sorry I'm not yet able to add comments, or I would have just added this comment to Daniel's answer, which is good.
Are you using the Ungreedy option (/U)? It doesn't say so in your code, but if so, it would explain why the first preg_replace() is replacing each single space with a single space (no change). In that case, the second preg_replace() would be replacing each double space with a single space. If you try the second one on a string of four spaces and the result is a double space, I would suspect ungreediness.
try preg_replace("/([\s]{2,})/", " ", $text)

Matching duplicate whitespace with preg_replace

I'm writing a WordPress plugin, and one of the features is removing duplicate whitespace.
My code looks like this:
return preg_replace('/\s\s+/u', ' ', $text, -1, $count);
I don't understand why I need the u
modifier. I've seen other plugins
that use preg_replace and don't
need to modify it for Unicode. I
believe I have a default installation
of WordPress .
Without the modifier, the code
replaces all the spaces with Unicode
replacement glyphs instead of spaces.
With the u modifier, I don't get
the glyphs, and it doesn't replace all the whitespace.
Each space below has from 1-10 spaces. The regex only removes on space from each group.
Before:
This sentence has extra space. This doesn’t. Extra space, Lots of extra space.
After:
This sentence has extra space. This doesn’t. Extra space, Lots of extra space.
$count = 9
How can I make the regex replace the whole match with the one space?
Update: If I try this with regular php, it works fine
$new_text = preg_replace('/\s\s+/', ' ', $text, -1, $count);
It only breaks when I use it within the wordpress plugin.
I'm using this function in a filter:
function jje_test( $text ) {
$new_text = preg_replace('/\s\s+/', ' ', $text, -1, $count);
echo "Count: $count";
return $new_text;
}
add_filter('the_content', 'jje_test');
I have tried:
Removing all other filters on the_content
remove_all_filters('the_content');
Changing the priority of the filter added to the_content, earlier or later
All kinds of permutations of \s+, \s\s+, [ ]+ etc.
Even replacing all single spaces with an empty string, will not replace the spaces
This will replace all sequences of two or more spaces, tabs, and/or line breaks with a single space:
return preg_replace('/[\p{Z}\s]{2,}/u', ' ', $text);
You need the /u flag if $text holds text encoded as UTF-8. Even if there are no Unicode characters in your regex, PCRE has to interpret $text correctly.
I added \p{Z} to the character class because PCRE only matches ASCII characters when using shorthands such as \s, even when using /u. Adding \p{Z} makes sure all Unicode whitespace is matched. There might be other spaces such as non-breaking spaces in your string.
I'm not sure if using echo in a WordPress filter is a good idea.
The u modifier simply puts it into UTF-8 mode, which is useful if you need to do anything specific with characters that have a code point above 0x7f. You can still work on UTF-8 encoded strings without using that modifier, you just won't be able to specifically match or transform such characters easily.
There are some whitespace characters in Unicode that are above 0x7f. It's pretty rare to encounter them in most data. But you may see, for example, a non-breaking space character, which is unicode \uA0, or some rarer characters.
I don't know why using it would cause Unicode "replacement" glyphs to be output. I'd say it would be a problem elsewhere... what character encoding are you outputting your script as?
To answer jjeaton's follow-up question in the comments to my first reply, the following replaces each sequence of spaces, tabs, and/or line breaks with the first character in that sequence. Effectively, this deletes the second and following whitespace characters in each sequence of two or more whitespace characters. A run of spaces is replaced with a single space, a run of tabs is replaced with a single tab, etc. A run of a space and a tab (in that order) is replaced with a space, and a run of a tab and a space is replaced with a tab, etc.
return preg_replace('/([\p{Z}\s])[\p{Z}\s]+/u', '$1', $text);
This regex works by first matching one space and capturing it with a capturing group, followed by one or more spaces. The replacement text is simply reinserts the text matched byt the first (and only) capturing group.
Don't know about any modifiers, but this did the trick:
<?php
$text = ' Hi, my name is Andrés. ';
echo preg_replace(array('/^\s+/', '/\s+$/', '/\s{2,}/'), ' ', $text);
/*
Hi, my name is Andrés.
*/
?>
preg_replace('!\s+!', ' ', 'This sentence has extra space. This doesn’t. Extra space, Lots of extra space.');

Categories