Reliably split user-submitted textarea value on newlines - php

The string input comes from textarea where users are supposed to enter every single item on a new line.
When processing the form, it is easy to explode the textarea input into an array of single items like this:
$arr = explode("\n", $textareaInput);
It works fine but I am worried about it not working correctly in different systems (I can currently only test in Windows). I know newlines are represented as \r\n or as just \r across different platforms. Will the above line of code also work correctly under Linux, Solaris, BSD or other OS?

You can use preg_split to do that.
$arr = preg_split('/[\r\n]+/', $textareaInput);
It splits it on any combination of the \r or \n characters. You can also use \s to include any white-space char.
Edit
It occurred to me, that while the previous code works fine, it also removes empty lines. If you want to preserve the empty lines, you may want to try this instead:
$arr = preg_split('/(\r\n|[\r\n])/', $textareaInput);
It basically starts by looking for the Windows version \r\n, and if that fails it looks for either the old Mac version \r or the Unix version \n.
For example:
<?php
$text = "Windows\r\n\r\nMac\r\rUnix\n\nDone!";
$arr = preg_split('/(\r\n|[\r\n])/', $text);
print_r($arr);
?>
Prints:
Array
(
[0] => Windows
[1] =>
[2] => Mac
[3] =>
[4] => Unix
[5] =>
[6] => Done!
)

'\r' by itself as a line terminator is an old convention that's not really used anymore (not since OSX which is Unix based).
Your explode will be fine. Just trim off the '\r' in each resulting element for the Windows users.

$arr = preg_split( "/[\n\r]+/", $textareaInput );

You can normalize the input:
<?php
$foo = strtr($foo, array(
"\r\n" => "\n",
"\r" => "\n",
"\n" => "\n",
));
?>
Alternatively, you can explode with regular expressions:
<?php
$foo = preg_split ("/[\r\n]+/", $foo);
?>

Following code must do the job
<?php
$split = preg_split('/[\r\n]+/', $src);
foreach ($split as $k=>$string) {
$split[$k] = trim($string);
if (empty($split[$k]))
unset($split[$k]);
}
ksort($split);
$join = implode('', $split);
?>
to get string with newlinews completely stripped. It won't work correctly with JS though :(

The system agnostic technique with regex is involves the \R escape sequence.
PHP Documentation on Escape Sequences
It really is as simple as calling preg_split('~\R~', $textareaInput).
\R - line break: matches \n, \r and \r\n
Normalizing the input is a waste of time and effort if you are just going to explode on the replacement characters anyhow.
If you are worried about multiple consecutive newline characters in the string, you can just add the + quantifier afer \R.
If you want to trim whitespace characters from both sides of the strings in the resultant array, you can use ~\s*\R\s*~

Related

Explode text into array as per paragraph

I have the following text:
$test = 'Test This is first line
Test:123
This is Test';
I want to explode this string to an array of paragraphs. I wrote the following code but it is not working:
$array = explode('\n\n', $test);
Any idea what I'm missing here?
You might be on Windows which uses \r\n instead of \n. You could use a regex to make it universal with preg_split():
$array = preg_split('#(\r\n?|\n)+#', $test);
Pattern explanation:
( : start matching group 1
\r\n?|\n : match \r\n, \r or \n
) : end matching group 1
+ : repeat one or more times
If you want to split by 2 newlines, then replace + by {2,}.
Update: you might use:
$array = preg_split('#\R+#', $test);
This extensive answer covers the meaning of \R. Note that this is only supported in PCRE/perl. So in a sense, it's less cross-flavour compatible.
Your code
$array = explode('\n\n', $test);
should have \n\n enclosed in double quotes:
$array = explode("\n\n", $test);
Using single quotes, it looks through the variable $test for a literal \n\n. With double quotes, it looks for the evaluated values of \n\n which are two carriage returns.
Also, note that the end of line depends on the host operating system. Windows uses \r\n instead of \n. You can get the end of line for the operating system by using the predefined constant PHP_EOL.
Try double quotes
$array = explode("\n\n", $test);
did you have try this ?
$array = explode("\n", $test);
The easiest way to get this text into an array like you describe would be:
preg_match_all('/.+/',$string, $array);
Since /./ matches any char, except for line terminators, and the + is greedy, it'll match as many chars as possible, until a new-line is encountered.
Using preg_match_all ensures this is repeated for each line, too. When I tried this, the output looked like this:
array (
0 =>
array (
0 => '$test = \'Test This is first line',
1 => 'Test:123',
2 => 'This is Test\';',
),
)
Also note that line-feeds are different, depending on the environment (\n for *NIX systems, compared to \r\n for windows, or in some cases a simple \r). Perhaps you might want to try explode(PHP_EOL, $text);, too
You need to use double quotes in your code, such that the \n\n is actually evaluated as two lines. Look below:
'Paragraph 1\n\nParagraph 2' =
Paragraph 1\n\nParagraph 2
Whereas:
"Paragraph 1\n\nParagraph 2" =
Paragraph 1
Paragraph 2
Also, Windows systems use \r\n\r\n instead of \n\n. You can detect which line endings the system is using with:
PHP_EOL
So, your final code would be:
$paragraphs = explode(PHP_EOL, $text);

character separator for newline textarea

So evidently:
\n = CR (Carriage Return) // Used as a new line character in Unix
\r = LF (Line Feed) // Used as a new line character in Mac OS
\r\n = CR + LF // Used as a new line character in Windows
(char)13 = \n = CR // Same as \n
but then I also heard that for HTML textarea, when it's submitted and parsed by a php script, all new lines are converted to \r\n regardless of the platform
is this true and can I rely on this or am I completely mistaken?
ie. if I wanna do explode() based on a new line, can I use '\r\n' as the delimiter regardless of whether or not the user is using mac, pc, etc
All newlines should be converted in \r\n by the spec.
So you could indeed do a simple explode("\r\n", $theContent) no matter the platform used.
P.S.
\r is only used on old(er) Macs. Nowadays Macs also use the *nix style line breaks (\n).
You could try preg_split, which will use a regular expression to split up the string. Within this regular expression you can match on all 3 new line variants.
$ArrayOfResults = preg_split( '/\r\n|\r|\n/', $YourStringToExplode );
It depends on what you want to achieve. If you are doing this eventually to display / format it as HTML, you can as well use the nl2br() function or possibly use str_replace like this:
$val = str_replace( array("\n","\r","\r\n"), '<br />', $val );
In case you want to just get an array of all lines, I would suggest you use all 3 characters ("\n","\r","\r\n") for explode

Turning multi-line string into multi-element array using regular expressions in PHP

I need to split the following string and put each new line into a new array element.
this is line a.(EOL chars = '\r\n' or '\n')
(EOL chars)
this is line b.(EOL chars)
this is line c.(EOL chars)
this is the last line d.(OPTIONAL EOL chars)
(Note that the last line might not have any EOL characters present. The string also sometimes contains only 1 line, which is by definition the last one.)
The following rules must be followed:
Empty lines (like the second line) should be discarded and not put
into the array.
EOL chars should not be included, because otherwise
my string comparisons fail.
So this should result in the following array:
[0] => "this is line a."
[1] => "this is line b."
[2] => "this is line c."
[3] => "this is the last line d."
I tried doing the following:
$matches = array();
preg_match_all('/^(.*)$/m', $str, $matches);
return $matches[1];
$matches[1] indeed contains each new line, but:
Empty lines are included as well
It seems that a '\r' character gets smuggled in anyway at the end of the strings in the array. I suspect this has something to do with the regex range '.' which includes everything except '\n'.
Anyway, I've been playing around with '\R' and whatnot, but I just can't find a good regex pattern that follows the two rules I outlined above. Any help please?
Just use preg_split() to split on the regular expression:
// Split on \n, \r is optional..
// The last element won't need an EOL.
$array = preg_split("/\r?\n/", $string);
Note, you might also want to trim($string) if there is a trailing newline, so you don't end up with an extra empty array element.
There is a function just for this - file()
I think preg_split would be the way to go... You can use an appropriate regexp to use any EOL character as separator.
Something like the following (the regexp needs to be a bit more elaborate):
$array = preg_split('/[\n\r]+/', $string);
Hope that helps,
Use preg_split function:
$array = preg_split('/[\r\n]+/', $string);

Problem Replacing Literal String \r\n With Line Break in PHP

I have a text file that has the literal string \r\n in it. I want to replace this with an actual line break (\n).
I know that the regex /\\r\\n/ should match it (I have tested it in Reggy), but I cannot get it to work in PHP.
I have tried the following variations:
preg_replace("/\\\\r\\\\n/", "\n", $line);
preg_replace("/\\\\[r]\\\\[n]/", "\n", $line);
preg_replace("/[\\\\][r][\\\\][n]/", "\n", $line);
preg_replace("/[\\\\]r[\\\\]n/", "\n", $line);
If I just try to replace the backslash, it works properly. As soon as I add an r, it finds no matches.
The file I am reading is encoded as UTF-16.
Edit:
I have also already tried using str_replace().
I now believe that the problem here is the character encoding of the file. I tried the following, and it did work:
$testString = "\\r\\n";
echo preg_replace("/\\\\r\\\\n/", "\n", $testString);
but it does not work on lines I am reading in from my file.
Save yourself the effort of figuring out the regex and try str_replace() instead:
str_replace('\r\n', "\n", $string);
Save yourself the effort of figuring out the regex and the escaping within double quotes:
$fixed = str_replace('\r\n', "\n", $line);
For what it is worth, preg_replace("/\\\\r\\\\n/", "\n", $line); should be fine. As a demonstration:
var_dump(preg_replace("/\\\\r\\\\n/", "NL", 'Cake is yummy\r\n\r\n'));
Gives: string(17) "Cake is yummyNLNL"
Also fine is: '/\\\r\\\n/' and '/\\\\r\\\\n/'
Important - if the above doesn't work, are you even sure literal \r\n is what you're trying to match?..
UTF-16 is the problem. If you're just working with raw the bytes, then you can use the full sequences for replacing:
$out = str_replace("\x00\x5c\x00\x72\x00\x5c\x00\x6e", "\x00\x0a", $in);
This assumes big-endian UTF-16, else swap the zero bytes to come after the non zeros:
$out = str_replace("\x5c\x00\x72\x00\x5c\x00\x6e\x00", "\x0a\x00", $in);
If that doesn't work, please post a byte-dump of your input file so we can see what it actually contains.
$result = preg_replace('/\\\\r\\\\n/', '\n', $subject);
The regex above replaces the type of line break normally used on windows (\r\n) with linux line breaks (\n).
References:
Difference between CR LF, LF and CR line break types?
Right way to escape backslash [ \ ] in PHP regex?
Regex Explanation
I always keep searching for this topic, and I always come back to a personal line I wrote.
It looks neat and its based on RegEx:
"/[\n\r]/"
PHP
preg_replace("/[\n\r]/",'\n', $string )
or
preg_replace("/[\n\r]/",$replaceStr, $string )

explode error \r\n and \n in windows and linux server

I have used explode function to get textarea's contain into array based on line. When I run this code in my localhost (WAMPserver 2.1) It work perfectly with this code :
$arr=explode("\r\n",$getdata);
When I upload to my linux server I need to change above code everytime into :
$arr=explode("\n",$getdata);
What will be the permanent solution to me. Which common code will work for me for both server?
Thank you
The constant PHP_EOL contains the platform-dependent linefeed, so you can try this:
$arr = explode(PHP_EOL, $getdata);
But even better is to normalize the text, because you never know what OS your visitors uses. This is one way to normalize to only use \n as linefeed (but also see Alex's answer, since his regex will handle all types of linefeeds):
$getdata = str_replace("\r\n", "\n", $getdata);
$arr = explode("\n", $getdata);
As far as I know the best way to split a string by newlines is preg_split and \R:
preg_split('~\R~', $str);
\R matches any Unicode Newline Sequence, i.e. not only LF, CR, CRLF, but also more exotic ones like VT, FF, NEL, LS and PS.
If that behavior isn't wanted (why?), you could specify the BSR_ANYCRLF option:
preg_split('~(*BSR_ANYCRLF)\R~', $str);
This will match the "classic" newline sequences only.
Well, the best approach would be to normalize your input data to just use \n, like this:
$input = preg_replace('~\r[\n]?~', "\n", $input);
Since:
Unix uses \n.
Windows uses \r\n.
(Old) Mac OS uses \r.
Nonetheless, exploding by \n should get you the best results (if you don't normalize).
The PHP_EOL constant contains the character sequence of the host operating system's newline.
$arr=explode(PHP_EOL,$getdata);
You could use preg_split() which will allow it to work regardless:
$arr = preg_split('/\r?\n/', $getdata);

Categories