I need to remove blank lines (with whitespace or absolutely blank) in PHP. I use this regular expression, but it does not work:
$str = ereg_replace('^[ \t]*$\r?\n', '', $str);
$str = preg_replace('^[ \t]*$\r?\n', '', $str);
I want a result of:
blahblah
blahblah
adsa
sad asdasd
will:
blahblah
blahblah
adsa
sad asdasd
// New line is required to split non-blank lines
preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $string);
The above regular expression says:
/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/
1st Capturing group (^[\r\n]*|[\r\n]+)
1st Alternative: ^[\r\n]*
^ assert position at start of the string
[\r\n]* match a single character present in the list below
Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\r matches a carriage return (ASCII 13)
\n matches a fine-feed (newline) character (ASCII 10)
2nd Alternative: [\r\n]+
[\r\n]+ match a single character present in the list below
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\r matches a carriage return (ASCII 13)
\n matches a fine-feed (newline) character (ASCII 10)
[\s\t]* match a single character present in the list below
Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\s match any white space character [\r\n\t\f ]
\tTab (ASCII 9)
[\r\n]+ match a single character present in the list below
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\r matches a carriage return (ASCII 13)
\n matches a fine-feed (newline) character (ASCII 10)
Your ereg-replace() solution is wrong because the ereg/eregi methods are deprecated. Your preg_replace() won't even compile, but if you add delimiters and set multiline mode, it will work fine:
$str = preg_replace('/^[ \t]*[\r\n]+/m', '', $str);
The m modifier allows ^ to match the beginning of a logical line rather than just the beginning of the whole string. The start-of-line anchor is necessary because without it the regex would match the newline at the end of every line, not just the blank ones. You don't need the end-of-line anchor ($) because you're actively matching the newline characters, but it doesn't hurt.
The accepted answer gets the job done, but it's more complicated than it needs to be. The regex has to match either the beginning of the string (^[\r\n]*, multiline mode not set) or at least one newline ([\r\n]+), followed by at least one newline ([\r\n]+). So, in the special case of a string that starts with one or more blank lines, they'll be replaced with one blank line. I'm pretty sure that's not the desired outcome.
But most of the time it replaces two or more consecutive newlines, along with any horizontal whitespace (spaces or tabs) that lies between them, with one linefeed. That's the intent, anyway. The author seems to expect \s to match just the space character (\x20), when in fact it matches any whitespace character. That's a very common mistake. The actual list varies from one regex flavor to the next, but at minimum you can expect \s to match whatever [ \t\f\r\n] matches.
Actually, in PHP you have a better option:
$str = preg_replace('/^\h*\v+/m', '', $str);
\h matches any horizontal whitespace character, and \v matches vertical whitespace.
Just explode the lines of the text to an array, remove empty lines using array_filter and implode the array again.
$tmp = explode("\n", $str);
$tmp = array_filter($tmp);
$str = implode("\n", $tmp);
Or in one line:
$str = implode("\n", array_filter(explode("\n", $str)));
I don't know, but this is maybe faster than preg_replace.
The comment from Bythos from Jamie's link above worked for me:
/^\n+|^[\t\s]*\n+/m
I didn't want to strip all of the new lines, just the empty/whitespace ones. This does the trick!
There is no need to overcomplicate things. This can be achieved with a simple short regular expression:
$text = preg_replace("/(\R){2,}/", "$1", $text);
The (\R) matches all newlines.
The {2,} matches two or more occurrences.
The $1 Uses the first backreference (platform specific EOL) as the replacement.
This has been already answered long time ago but can greatly benefit for preg_replace and a much simplified pattern:
$result = preg_replace('/\s*($|\n)/', '\1', $subject);
Pattern: Remove all white-space before a new-line -or- at the end of the string.
Longest match wins:
As the white-space \s has a greedy quantifier * and contains \n consecutive empty lines are matched.
As \s contains \r as well, \r\n new-line sequences are supported, however single \r (without \n) are not.
And when $ matches the end of the buffer the backreference \1 is empty allowing to handle trailing whitespace at the very end, too.
If leading (empty) lines need to be removed as well, they have to match while not capturing, too (this was not directly asked for but could be appropriate):
$result = preg_replace('/^(?:\s*\n)+|\s*($|\n)/', '\1', $subject);
# '----------'
Pattern: Also remove all leading white-space (first line(s) are empty).
And if the new-line at the end of the buffer should be normalized differently (always a newline at the end instead of never), it needs to be added: . "\n".
This variant is portable to \r\n, \r and \n new-line sequences ((?>\r\n|\r|\n)) or \R:
$result = preg_replace('/^(?> |\t|\r\n|\r|\n)+|(?> |\t|\r\n|\r|\n)*($|(?>\r\n|\r|\n))/', '\1', $subject);
# or:
$result = preg_replace('/^(?:\s*\R)+|\s*($|\R)/', '\1', $subject);
Pattern: Support all new-line sequences.
This is with the downside that the new-lines can not be normalized (e.g. any of the three to \n).
Therefore, it can make sense to normalize new-lines before removing:
$result = preg_replace(['/(?>\r\n|\n|\r)/', '/\s*($|\n)/'], ["\n", '\1'], $subject);
# or:
$result = preg_replace(['/\R/u', '/\s*($|\n)/'], ["\n", '\1'], $subject);
It ships with the opportunity to do some normalization apart from the line handling.
For example removal of the trailing white-space and fixing the missing new-line at the end of file.
Then doing more advanced line normalization, for example zero empty lines at the beginning and end; otherwise not more than two consecutive empty lines:
$result = preg_replace(
['/[ \t]*($|\R)/u', '/^\n*|(\n)\n*$|(\n{3})\n+/'],
["\n" , '\1\2' ],
$subject
);
The secondary pattern benefits from the first patterns replacements already.
The power with preg_replace relies here in choosing the backreference(s) to replace with wisely.
Also using multiple patterns can greatly simplify things and keep the process maintainable.
Try this one:
$str = preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\r\n", $str);
If you output this to a text file, it will give the same output in the simple Notepad, WordPad and also in text editors, for example Notepad++.
Use this:
$str = preg_replace('/^\s+\r?\n$/D', '', $str);
function trimblanklines($str) {
return preg_replace('`\A[ \t]*\r?\n|\r?\n[ \t]*\Z`','',$str);
}
This one only removes them from the beginning and end, not the middle (if anyone else was looking for this).
The accepted answer leaves an extra line-break at the end of the string. Using rtrim() will remove this final linebreak:
rtrim(preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $string));
From this answer, the following works fine for me!
$str = "<html>
<body>";
echo str_replace(array("\r", "\n"), '', $str);
<?php
function del_blanklines_in_array_q($ar){
$strip = array();
foreach($ar as $k => $v){
$ll = strlen($v);
while($ll--){
if(ord($v[$ll]) > 32){ //hex /0x20 int 32 ascii SPACE
$strip[] = $v; break;
}
}
}
return $strip;
}
function del_blanklines_in_file_q($in, $out){
// in filename, out filename
$strip = del_blanklines_in_array_q(file($in));
file_put_contents($out, $strip );
}
$file = "file_name.txt";
$file_data = file_get_contents($file);
$file_data_after_remove_blank_line = preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $file_data );
file_put_contents($file,$file_data_after_remove_blank_line);
nl2br(preg_replace('/^\v+/m', '', $r_msg))
Related
I manage to remove the spaces but I can't understand why it would remove my returns as well. I have a textarea in my form and I want to allow up to two returns maximum. Here is what I have been using so far.
$string = preg_replace('/\s\s+/', ' ', $string); // supposed to remove more than one consecutive space - but also deletes my returns ...
$string = preg_replace('/\n\n\n+/', '\n\n', $string); // using this one by itself does not do as expected and removes all returns ...
It seems first line already gets rid of more than one spaces AND all returns ... Which is strange. Not sure than I am doing it right ...
Because \s will also match newline characters. So i suggest you to use \h for matching any kind of horizontal spaces.
$string = preg_replace('/\h\h+/', ' ', $string);
\s match any white space character [\r\n\t\f ]
See the deifinition of \s.It includes \n.Use
\h matches any horizontal whitespace character (equal to [[:blank:]])
Use \h for horizontal whitespaces.
For those of you who will need it, that's how you remove two carriage returns from a textarea.
preg_replace('/\n\r(\n\r)+/', "\n\r", $str);
For the space issue, as it has been posted above, replace \s by \h
We're scrubbing a ridiculous amount of data, and am finding many examples of clean data that are left with irrelevant punctuation at the beginning and end of the final string. Quotes and DoubleQuotes are fine, but leading/trailing dashes, commas, etc need to be removed
I've studied the answer at How can I remove all leading and trailing punctuation?, but am unable to find a way to accomplish the same in PHP.
- some text. dash and period should be removed
"Some Other Text". period should be removed
it's a matter of opinion apostrophe should be kept
/ some more text? Slash should be removed and question mark kept
In short,
Certain punctuation occurring BEFORE the first AlphaNumeric character must be removed
Certain punctuation occurring AFTER the last AlphaNumeric character must be removed
How can I accomplish this with PHP - the few examples I've found surpass my RegEx/JS abilites.
This is an answer without regex.
You can use the function trim (or a combination of ltrim/rtrim to specify all characters you want to remove. For your example:
$str = trim($str, " \t\n\r\0\x0B-.");
(As I suppose you also want to remove spacing and newlines at the begin/end, I left the default mask)
See also rtrim and ltrim if you don't want to remove the same charlist at the beginning and the end of your strings.
You can modify the pattern to include characters.
$array = array(
'- some text.',
'"Some Other Text".',
'it\'s a matter of opinion',
'/ some more text?'
);
foreach($array as $key => $string){
$array[$key] = preg_replace(array(
'/^[\.\-\/]*/',
'/[\.\-\/]*$/'
), array('', ''), $string);
}
print_r($array);
If the punctuation could be more than one character, you could do this
function trimFormatting($str){ // trim
$osl = 0;
$pat = '(<br>|,|\s+)';
while($osl!==strlen($str)){
$osl = strlen($str);
$str =preg_replace('/^'.$pat.'|'.$pat.'$/i','',$str);
}
return $str;
}
echo trimFormatting('<BR>,<BR>Hello<BR>World<BR>, <BR>');
// will give "Hello<BR>World"
The routine checks for "<BR>" and "," and one or spaces ("\s+"). The "|" being the OR operator used three times in the routine. It trims both at the start "^" and the end "$" at the same time. It keeps looping through this until no more matches are trimmed off (i.e. there is no further reduction in string length).
Hello guys I currently have a problem with my preg_replace :
preg_replace('#[^a-zA-z\s]#', '', $string)
It keeps all alphabetic letters and white spaces but I want more than one white space to be reduced to only one. Any idea how this can be done ?
$output = preg_replace('!\s+!', ' ', $input);
From Regular Expression Basic Syntax Reference
\d, \w and \s
Shorthand character classes matching digits, word characters (letters, digits, and underscores), and whitespace (spaces, tabs, and line breaks). Can be used inside and outside character classes.
The character type \s stands for five different characters: horizontal tab (9), line feed (10), form feed (12), carriage return (13) and ordinary space (32). The following code will find every substring of $string which is composed entirely of \s. Only the first \s in the substring will be preserved. For example, if line feed, horizontal tab and ordinary space occur immediately after one another in a substring, line feed alone will remain after the replacement is done.
$string = preg_replace('#(\s)\s+#', '\1', $string);
preg_replace(array('#\s+#', '#[^a-zA-z\s]#'), array(' ', ''), $string);
Though it will replace all of whitespaces with spaces. If you want to replace consequent whitespaces (like two newlines with only one newline) - you should figure out logic for that, coz \s+ will match "\n \n \n" (5 whitespaces in a row).
try using trim instead
<?php
$something = " Error";
echo $something."\n";
echo "------"."\n";
echo trim($something);
?>
output
Error
------
Error
Question is old and miss some details. Let's assume OP wanted to reduce all consecutive horizontal whitespaces and replace by a space.
Exemple:
"\t\t \t \t" => " "
"\t\t \t\t" => "\t \t"
One possible solution would be simply to use the generic character type \h which stands for horizontal whitespace space:
preg_replace('/\h+/', ' ', $text)
I have a String like this "{\i1}You were happy?{\i0}"
and I want to remove all "{...}" that just the Text "You were happy?" is left.
I have tried it with some regex pattern but i did not get it work.
One of my try:
text = preg_replace("/{.*}/", "\\0", $text);
You can use this for replacing all the text between {}
$result = preg_replace('/\{[^}]*\}/', '', $subject);
Explanation
"
\{ # Match the character “{” literally
[^}] # Match any character that is NOT a “}”
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\} # Match the character “}” literally
"
First you should make the matches ungreedy. Apply a ? after *.
The opening { does not need to be escaped in this very example, but I'd do it anyway.
And then you are using \0 as replacement pattern. That will reinsert whatever the regex matched. So nothing would be removed in the end - which I heard is not what you want.
$text = preg_replace("/\{.*?}/", "", $text);
I'm writing a trimming function that takes a string and finds the first newline \n character after the 500th character and returns a string up to the newline. Basically, if there are \n at indices of 200, 400, and 600, I want the function to return the first 600 characters of the string (not including the \n).
I tried:
$output = preg_replace('/([^%]{500}[^\n]+?)[^%]*/','$1',$output);
I used the percent sign because I couldn't find a character class that just encompassed "everthing". Dot didn't do it because it excluded newlines. Unfortunately, my function fails miserably. Any help or guidance would be appreciated.
Personally I would avoid regex and use simple string functions:
// $str is the original string
$nl = strpos( $str, "\n", 500 ); // finds first \n starting from char 500
$sub = substr( $str, 0, $nl );
$final = str_replace( "\n", ' ', $sub );
You might need to check for \r\n as well - i.e. normalize first using str_replace( "\r\n", "\n", $str ).
You can add the s (DOTALL) modifier to make . match newlines, then just make the second bit ungreedy. I've also made it match everything if the string is under 500 characters and anchored it to the start:
preg_match('/^.{500}[^\n]+|^.{0,500}$/s', $output, $matches);
$output = $matches[0];
use
'/(.{500,}?)(?=\n)/s'
as pattern
the /s at the end makes the dot catch newlines, {500,} means "match 500 or more" with the question mark matching as few as possible. the (?=\n) is a positive lookahead, which means the whole matched string has to be followed by a \n, but the lookahead doesn't capture anything. so it checks that the 500+ character string is followed by a newline, but doesn't include the newline in the match (or the replace, for that matter).
Though the lookahead thingy is a little fancy in this case, I guess
'/(.{500,}?)\n/s'
would do just as well. I just like lookaheads :)