character separator for newline textarea - php

So evidently:
\n = CR (Carriage Return) // Used as a new line character in Unix
\r = LF (Line Feed) // Used as a new line character in Mac OS
\r\n = CR + LF // Used as a new line character in Windows
(char)13 = \n = CR // Same as \n
but then I also heard that for HTML textarea, when it's submitted and parsed by a php script, all new lines are converted to \r\n regardless of the platform
is this true and can I rely on this or am I completely mistaken?
ie. if I wanna do explode() based on a new line, can I use '\r\n' as the delimiter regardless of whether or not the user is using mac, pc, etc

All newlines should be converted in \r\n by the spec.
So you could indeed do a simple explode("\r\n", $theContent) no matter the platform used.
P.S.
\r is only used on old(er) Macs. Nowadays Macs also use the *nix style line breaks (\n).

You could try preg_split, which will use a regular expression to split up the string. Within this regular expression you can match on all 3 new line variants.
$ArrayOfResults = preg_split( '/\r\n|\r|\n/', $YourStringToExplode );

It depends on what you want to achieve. If you are doing this eventually to display / format it as HTML, you can as well use the nl2br() function or possibly use str_replace like this:
$val = str_replace( array("\n","\r","\r\n"), '<br />', $val );
In case you want to just get an array of all lines, I would suggest you use all 3 characters ("\n","\r","\r\n") for explode

Related

Removing newlines in php

Following is the syntax for preg_replace() function in php:
$new_string = preg_replace($pattern_to_match, $replacement_string, $original_string);
if a text file has both Windows (rn) and Linux(n) End of line (EOL) characters i.e line feeds.
then which of the following is the correct order of applying preg_replace() to get rid of all end of line characters?
remove cr first
$string = preg_replace('|rn|','',$string);
$string = preg_replace('|n|','',$string);
remove plain nl first
$string = preg_replace('|n|','',$string);
$string = preg_replace('|rn|','',$string);
I would recommend to use: (Windows, Unix and Mac EOL characters)
$string = preg_replace('/\r|\n/m','',$string);
Notice m multiline modifier.
which of the following is the correct order of applying preg_replace to get rid of all end of line characters?
$string = preg_replace("!\r|\n!m",'',$string);
Using the power of regular expressions, you could specify something like
'|[\r][\n]|'
which specifically mean, 0 or 1 '\r', then 0 or 1 '\n' which would match the end of a row under both linux and windows.
EDIT:
Using the build-in function trim would achieve the same result in an even better manner, but only if the newline characters are located at the beginning or end of the string.

Problem Replacing Literal String \r\n With Line Break in PHP

I have a text file that has the literal string \r\n in it. I want to replace this with an actual line break (\n).
I know that the regex /\\r\\n/ should match it (I have tested it in Reggy), but I cannot get it to work in PHP.
I have tried the following variations:
preg_replace("/\\\\r\\\\n/", "\n", $line);
preg_replace("/\\\\[r]\\\\[n]/", "\n", $line);
preg_replace("/[\\\\][r][\\\\][n]/", "\n", $line);
preg_replace("/[\\\\]r[\\\\]n/", "\n", $line);
If I just try to replace the backslash, it works properly. As soon as I add an r, it finds no matches.
The file I am reading is encoded as UTF-16.
Edit:
I have also already tried using str_replace().
I now believe that the problem here is the character encoding of the file. I tried the following, and it did work:
$testString = "\\r\\n";
echo preg_replace("/\\\\r\\\\n/", "\n", $testString);
but it does not work on lines I am reading in from my file.
Save yourself the effort of figuring out the regex and try str_replace() instead:
str_replace('\r\n', "\n", $string);
Save yourself the effort of figuring out the regex and the escaping within double quotes:
$fixed = str_replace('\r\n', "\n", $line);
For what it is worth, preg_replace("/\\\\r\\\\n/", "\n", $line); should be fine. As a demonstration:
var_dump(preg_replace("/\\\\r\\\\n/", "NL", 'Cake is yummy\r\n\r\n'));
Gives: string(17) "Cake is yummyNLNL"
Also fine is: '/\\\r\\\n/' and '/\\\\r\\\\n/'
Important - if the above doesn't work, are you even sure literal \r\n is what you're trying to match?..
UTF-16 is the problem. If you're just working with raw the bytes, then you can use the full sequences for replacing:
$out = str_replace("\x00\x5c\x00\x72\x00\x5c\x00\x6e", "\x00\x0a", $in);
This assumes big-endian UTF-16, else swap the zero bytes to come after the non zeros:
$out = str_replace("\x5c\x00\x72\x00\x5c\x00\x6e\x00", "\x0a\x00", $in);
If that doesn't work, please post a byte-dump of your input file so we can see what it actually contains.
$result = preg_replace('/\\\\r\\\\n/', '\n', $subject);
The regex above replaces the type of line break normally used on windows (\r\n) with linux line breaks (\n).
References:
Difference between CR LF, LF and CR line break types?
Right way to escape backslash [ \ ] in PHP regex?
Regex Explanation
I always keep searching for this topic, and I always come back to a personal line I wrote.
It looks neat and its based on RegEx:
"/[\n\r]/"
PHP
preg_replace("/[\n\r]/",'\n', $string )
or
preg_replace("/[\n\r]/",$replaceStr, $string )

explode error \r\n and \n in windows and linux server

I have used explode function to get textarea's contain into array based on line. When I run this code in my localhost (WAMPserver 2.1) It work perfectly with this code :
$arr=explode("\r\n",$getdata);
When I upload to my linux server I need to change above code everytime into :
$arr=explode("\n",$getdata);
What will be the permanent solution to me. Which common code will work for me for both server?
Thank you
The constant PHP_EOL contains the platform-dependent linefeed, so you can try this:
$arr = explode(PHP_EOL, $getdata);
But even better is to normalize the text, because you never know what OS your visitors uses. This is one way to normalize to only use \n as linefeed (but also see Alex's answer, since his regex will handle all types of linefeeds):
$getdata = str_replace("\r\n", "\n", $getdata);
$arr = explode("\n", $getdata);
As far as I know the best way to split a string by newlines is preg_split and \R:
preg_split('~\R~', $str);
\R matches any Unicode Newline Sequence, i.e. not only LF, CR, CRLF, but also more exotic ones like VT, FF, NEL, LS and PS.
If that behavior isn't wanted (why?), you could specify the BSR_ANYCRLF option:
preg_split('~(*BSR_ANYCRLF)\R~', $str);
This will match the "classic" newline sequences only.
Well, the best approach would be to normalize your input data to just use \n, like this:
$input = preg_replace('~\r[\n]?~', "\n", $input);
Since:
Unix uses \n.
Windows uses \r\n.
(Old) Mac OS uses \r.
Nonetheless, exploding by \n should get you the best results (if you don't normalize).
The PHP_EOL constant contains the character sequence of the host operating system's newline.
$arr=explode(PHP_EOL,$getdata);
You could use preg_split() which will allow it to work regardless:
$arr = preg_split('/\r?\n/', $getdata);

how to manipulate line breaks in php?

I wanted to know if there's a way to manipulate line breaks within PHP. Like, to explicitly tell what kind of line break to select (LF, CRLF...) for using in an explode() function for instance.
it would be something like that:
$rows = explode('<LF>', $list);
//<LF> here would be the line break
anyone can help? thanks (:
LF and CR are just abbreviations for the characters with the code point 0x0A (LINE FEED) and 0x0D (CARRIAGE RETURN) in ASCII. You can either write them literally or use appropriate escape sequences:
"\x0A" "\n" // LF
"\x0D" "\r" // CR
Remember using the double quotes as single quotes do only know the escape sequences \\ and \'.
CRLF would then just be the concatenation of both characters. So:
$rows = explode("\r\n", $list);
If you want to split at both CR and LF you can do a split using a regular expression:
$rows = preg_split("/[\r\n]/", $list);
And to skip empty lines (i.e. sequences of more than just one line break characters):
$rows = preg_split("/[\r\n]+/", $list);
Some possibilities I can think of, depending on your needs:
Pick an EOL style and specify the exact character(s): "\r\n"
Choose the EOL of the platform PHP runs on and use the PHP_EOL constant
Use regular expressions: preg_split('/[\r\n]+/', ...)
Use a function that can autodetect line endings: file()
Normalize the input string before exploding:
$text = strtr($text, array(
"\r\n" => PHP_EOL,
"\r" => PHP_EOL,
"\n" => PHP_EOL,
));

Reliably split user-submitted textarea value on newlines

The string input comes from textarea where users are supposed to enter every single item on a new line.
When processing the form, it is easy to explode the textarea input into an array of single items like this:
$arr = explode("\n", $textareaInput);
It works fine but I am worried about it not working correctly in different systems (I can currently only test in Windows). I know newlines are represented as \r\n or as just \r across different platforms. Will the above line of code also work correctly under Linux, Solaris, BSD or other OS?
You can use preg_split to do that.
$arr = preg_split('/[\r\n]+/', $textareaInput);
It splits it on any combination of the \r or \n characters. You can also use \s to include any white-space char.
Edit
It occurred to me, that while the previous code works fine, it also removes empty lines. If you want to preserve the empty lines, you may want to try this instead:
$arr = preg_split('/(\r\n|[\r\n])/', $textareaInput);
It basically starts by looking for the Windows version \r\n, and if that fails it looks for either the old Mac version \r or the Unix version \n.
For example:
<?php
$text = "Windows\r\n\r\nMac\r\rUnix\n\nDone!";
$arr = preg_split('/(\r\n|[\r\n])/', $text);
print_r($arr);
?>
Prints:
Array
(
[0] => Windows
[1] =>
[2] => Mac
[3] =>
[4] => Unix
[5] =>
[6] => Done!
)
'\r' by itself as a line terminator is an old convention that's not really used anymore (not since OSX which is Unix based).
Your explode will be fine. Just trim off the '\r' in each resulting element for the Windows users.
$arr = preg_split( "/[\n\r]+/", $textareaInput );
You can normalize the input:
<?php
$foo = strtr($foo, array(
"\r\n" => "\n",
"\r" => "\n",
"\n" => "\n",
));
?>
Alternatively, you can explode with regular expressions:
<?php
$foo = preg_split ("/[\r\n]+/", $foo);
?>
Following code must do the job
<?php
$split = preg_split('/[\r\n]+/', $src);
foreach ($split as $k=>$string) {
$split[$k] = trim($string);
if (empty($split[$k]))
unset($split[$k]);
}
ksort($split);
$join = implode('', $split);
?>
to get string with newlinews completely stripped. It won't work correctly with JS though :(
The system agnostic technique with regex is involves the \R escape sequence.
PHP Documentation on Escape Sequences
It really is as simple as calling preg_split('~\R~', $textareaInput).
\R - line break: matches \n, \r and \r\n
Normalizing the input is a waste of time and effort if you are just going to explode on the replacement characters anyhow.
If you are worried about multiple consecutive newline characters in the string, you can just add the + quantifier afer \R.
If you want to trim whitespace characters from both sides of the strings in the resultant array, you can use ~\s*\R\s*~

Categories