replacing spaces and newlines with commas - php

In my preg_replace RegEx here
$string = preg_replace('~[^[:alnum:],]*,[^[:alnum:]]*~', ',', $string);
i've been trying to split words from each other with commas, and it worked, But then i tried for a string like
x
y
z
and
x y z
to replace the whitespaces and newlines with so i wrote the tried using [[:space:]] and [[:blank:]] but they're more of solving whitespaces, but not newlines,
How to handle the new lines? i tried using my old replacement /[\s,]+/ for newlines and whitespaces, But still no effect, I know i can go two queries like
$string = preg_replace('/[\s,]+/', ',', $string);
$string = preg_replace('~[^[:alnum:],]*,[^[:alnum:]]*~', ',', $string);
but i prefer merging them into one RegEx for performance.

Try the following:
preg_replace("'/[^[:alnum:],]*,[^[:alnum:]]*|[\s,]+/'", ",", $string);
It will replace all spaces and new lines with a , comma.

Related

Normalize spaces in a string?

I need to normalize the spaces in a string:
Remove multiple adjacent spaces
Remove spaces at the beginning and end of the string
E.g. " my name is " => my name is
I tried
str_replace(' ',' ',$str);
I also tried php Replacing multiple spaces with a single space but that didn't work either.
Replace any occurrence of 2 or more spaces with a single space, and trim:
$str = preg_replace('/ {2,}/', ' ', trim($input));
Note: using the whitespace character class \s here is a fairly bad idea since it will match linebreaks and other whitespace that you might not expect.
Use a regex
$text = preg_replace("~\\s{2,}~", " ", $text);
The \s approach strips away newlines too, and / {2,}/ approach ignores tabs and spaces at beginning of line right after a newline.
If you want to save newlines and get a more accurate result, I'd suggest this impressive answer to similar question, and this improvement of the previous answer. According to their note, the answer to your question is:
$norm_str = preg_replace('/[^\S\r\n]+/', ' ', trim($str));
In short, this is taking advantage of double negation. Read the links to get an in-depth explanation of the trick.

Regex: remove non-alphanumeric chars, multiple whitespaces and trim() all together

I have a $text to strip off all non-alphanumeric chars, replace multiple white spaces and newline by single space and eliminate beginning and ending space.
This is my solution so far.
$text = '
some- text!!
for testing?
'; // $text to format
//strip off all non-alphanumeric chars
$text = preg_replace("/[^a-zA-Z0-9\s]/", "", $text);
//Replace multiple white spaces by single space
$text = preg_replace('/\s+/', ' ', $text);
//eliminate beginning and ending space
$finalText = trim($text);
/* result: $finalText ="some text for testing";
without non-alphanumeric chars, newline, extra spaces and trim()med */
Is it possible to combine/achieve all these in one regular expression? as I would get the desired result in one line as below
$finalText = preg_replace(some_reg_expression, $replaceby, $text);
thanks
Edit: clarified with a test string
Of course you can. That is very easy.
The re will look like:
((?<= )\s*)|[^a-zA-Z0-9\s]|(\s*$)|(^\s*)
I have no PHP at hand, I have used Perl (just to test the re and show that it works) (you can play with my code here):
$ cat test.txt
a b c d
a b c e f g fff f
$ cat 1.pl
while(<>) {
s/((?<= )\s*)|[^a-zA-Z0-9\s]|(\s*$)|(^\s*)//g;
print $_,"\n";
}
$ cat test.txt | perl 1.pl
a b c d
a b c e f g fff f
For PHP it will be the same.
What does the RE?
((?<= )\s*) # all spaces that have at least one space before them
|
[^a-zA-Z0-9\s] # all non-alphanumeric characters
|
(\s*$) # all spaces at the end of string
|
(^\s*) # all spaces at the beginning of string
The only tricky part here is ((?<= )\s*), lookbehind assertion. You remove spaces if and only if the substring of spaces has a space before.
When you want to know how lookahead/lookbehind assertions work, please take a look at http://www.regular-expressions.info/lookaround.html.
Update from the discussion:
What happens when $text ='some ? ! ? text';?
Then the resulting string contains multiple spaces between "some" and "text".
It is not so easy to solve the problem, because one need positive lookbehind assertions with variable length, and that is not possible at the moment. One cannot simple check spaces because it can happen so that it is not a space but non-alphanumerich character and it will be removed anyway (for example: in " !" the "!" sign will be removed but RE knows nothing about; one need something like (?<=[^a-zA-Z0-9\s]* )\s* but that unfortunately will not work because PCRE does not support lookbehind variable length assertions.
I do not think that you can achieve that with one regex. You would basically need to stick in an if else condition, which it is not possible through Regular Expressions alone.
You would basically need one regex to remove non-alphanumeric digits and another one to collapse the spaces, which is basically what you are already doing.
Check this if this is what you are looking for ---
$patterns = array ('/[^a-zA-Z0-9\s]/','/\s+/');
$replace = array ("", ' ');
trim( preg_replace($patterns, $replace, $text) );
MAy be it may need some modification, just let me know if this is something what you want to do??
For your own sanity, you will want to keep regular expressions that you can still understand and edit later on :)
$text = preg_replace(array(
"/[^a-zA-Z0-9\s]/", // remove all non-space, non-alphanumeric characters
'/\s{2,}/', // replace multiple white space occurrences with single
), array(
'',
' ',
), trim($originalText));
$text =~ s/([^a-zA-Z0-9\s].*?)//g;
Doesn't have to be any harder than this.

How to remove multiple spaces and new lines from a string in PHP?

I have a form with a text area, I need to remove from the string entered here eventuals multiple spaces and multiple new lines.
I have written this function to remove the multiple spaces
function fix_multi_spaces($string)
{
$reg_exp = '/\s+/';
return preg_replace($reg_exp," ",$string);
}
This function works good for spaces, but it also replace the new lines changing them into a single space.
I need to change multiple spaces into 1 space and multiple new lines into 1 new line.
How can I do?
Use
preg_replace('/(( )+|(\\n)+)/', '$2$3', $string);
This will work specifically for spaces and newlines; you will have to add other whitespace characters (such as \t for tabs) to the regex if you want to target them as well.
This regex works by matching either one or more spaces or one or more newlines and replacing the match with a space (but only if spaces were matched) and a newline (but only if newlines were matched).
Update: Turns out there's some regex functionality tailored for such cases which I didn't know about (many thanks to craniumonempty for the comment!). You can write the regex perhaps more appropriately as
preg_replace('/(?|( )+|(\\n)+)/', '$1', $string);
You know that \s in regex is for all whitepsaces, this means spaces, newlines, tab etc.
If You would like to replace multiple spaces by one and multiple newlines by one, You would have to rwrite the function to call preg_replace twice - once replacing spaces and once replacing newlines...
You can use following function for replace multiple space and lines with single space...
function test($content_area){
//Newline and tab space to single space
$content_area = str_replace(array("\r\n", "\r", "\n", "\t"), ' ', $content_area);
// Multiple spaces to single space ( using regular expression)
$content_area = ereg_replace(" {2,}", ' ',$content_area);
return $content_area;
}

How to strip out extra asterisks in a string using preg_replace()

I know how to strip out extra spaces, dashes, and periods using preg_replace(), but I need to know what format below is correct for stripping out extra asterisks in a string.
These lines of code work for stripping out extra spaces, dashes, and periods:
// Strips out extra spaces
$string = preg_replace('/\s\s+/', ' ',$string);
// Strips out extra dashes
$string = preg_replace('/-+/', '-', $string);
// Strips out extra periods
$string = preg_replace('/\.+/', '.', $string);
Which of the following is correct for stripping out extra asterisks?
// Version 1: Strips out extra asterisks
$string = preg_replace('/\*+/', '*', $string);
// Version 2: Strips out extra asterisks
$string = preg_replace('/*+/', '*', $string);
Thank you in advance.
By the way, is there a list somewhere that shows all the characters that need to be escaped with a forward slash when using PHP?
Try this:
$string = preg_replace('/\*{2,}/', '*', $string);
This will replace any instances of multiple asterisks next to one another with one asterisk.
Or, if you wanted to just get rid of all asterisks:
$string = preg_replace('/[\*]+/', '', $string);
It's worth noting that * is a special character in regular expressions; so, you must escape it with a backslash.
Also, here's a good regex reference:
http://www.regular-expressions.info/reference.html
Here's how you could combine multiple character replacements into one regex:
$string = preg_replace('/(\*|\.){2,}/', '$1', $string);
This will replace asterisks as well as periods.

Regular Expressions (php)- match blocks of non alphanumerical charactors

I'm in need to modify a given string to contain only alpha numerical characters, dots (.) and commas.
If the string contains any character other than a-z, A-Z , 0-9 or a dot(.), they should be replaced with a comma sign, I'm using this:
$string = "dycloro 987 stackOVERflow !|,!!friday";
$newstring = preg_replace('/[^a-zA-Z0-9\.]/', ',', $string);
This returns,
dycloro,987,stackOVERflow,,,,,,friday
But I'm in need to get the following instead.
dycloro,987,stackOVERflow,friday
(Note the " !|,!!" part in $string is replaced with a single comma sign).
Ideally, I want to replace a block of disallowed characters with a single comma sign.
I figured out that
$newstring = preg_replace('/,{2,}/', ',', $newstring); replaces multiple comma signs with a single comma. But is there any way to do this in a faster, or better way ?
How do I do this in a single regular expression match ?
and is there any process time or memory difference in them ? This is regular expressions will be run against few megabytes of user input so I'm curious about it as well.
Thank you!
Just add a plus sign +, meaning "one or more of what I just mentioned", after the character class:
$string = "dycloro 987 stackOVERflow !|,!!friday";
$newstring = preg_replace('/[^a-zA-Z0-9\.]+/', ',', $string);
See http://www.php.net/manual/en/regexp.reference.repetition.php.
Try this one
$newstring = preg_replace('/[^a-zA-Z0-9\.]+/', ',', $string);

Categories