explode error \r\n and \n in windows and linux server - php

I have used explode function to get textarea's contain into array based on line. When I run this code in my localhost (WAMPserver 2.1) It work perfectly with this code :
$arr=explode("\r\n",$getdata);
When I upload to my linux server I need to change above code everytime into :
$arr=explode("\n",$getdata);
What will be the permanent solution to me. Which common code will work for me for both server?
Thank you

The constant PHP_EOL contains the platform-dependent linefeed, so you can try this:
$arr = explode(PHP_EOL, $getdata);
But even better is to normalize the text, because you never know what OS your visitors uses. This is one way to normalize to only use \n as linefeed (but also see Alex's answer, since his regex will handle all types of linefeeds):
$getdata = str_replace("\r\n", "\n", $getdata);
$arr = explode("\n", $getdata);

As far as I know the best way to split a string by newlines is preg_split and \R:
preg_split('~\R~', $str);
\R matches any Unicode Newline Sequence, i.e. not only LF, CR, CRLF, but also more exotic ones like VT, FF, NEL, LS and PS.
If that behavior isn't wanted (why?), you could specify the BSR_ANYCRLF option:
preg_split('~(*BSR_ANYCRLF)\R~', $str);
This will match the "classic" newline sequences only.

Well, the best approach would be to normalize your input data to just use \n, like this:
$input = preg_replace('~\r[\n]?~', "\n", $input);
Since:
Unix uses \n.
Windows uses \r\n.
(Old) Mac OS uses \r.
Nonetheless, exploding by \n should get you the best results (if you don't normalize).

The PHP_EOL constant contains the character sequence of the host operating system's newline.
$arr=explode(PHP_EOL,$getdata);

You could use preg_split() which will allow it to work regardless:
$arr = preg_split('/\r?\n/', $getdata);

Related

Replace all kind of dashes

I have a excel document which I import in MySQL using a library.
But some of the texts in the document contain dashes which I though I have replaced, but apparently not all of them.
-, –, - <-all of these are different.
Is there any way I could replace all kind of dahes with this one -
The main problem is that I dont know all of the dashes that exist in computers.
Just use regex with unicode modifier u and a character class:
$output = preg_replace('#\p{Pd}#u', '-', $input);
From the manual : Pd Dash punctuation
Online demo
How about:
$string = str_replace(array('-','–','-','—', ...), '-', $string);
Use the above code and see if it works. If you're still seeing some dashes not being replaced, you can just add them into the array, and it'll work.

character separator for newline textarea

So evidently:
\n = CR (Carriage Return) // Used as a new line character in Unix
\r = LF (Line Feed) // Used as a new line character in Mac OS
\r\n = CR + LF // Used as a new line character in Windows
(char)13 = \n = CR // Same as \n
but then I also heard that for HTML textarea, when it's submitted and parsed by a php script, all new lines are converted to \r\n regardless of the platform
is this true and can I rely on this or am I completely mistaken?
ie. if I wanna do explode() based on a new line, can I use '\r\n' as the delimiter regardless of whether or not the user is using mac, pc, etc
All newlines should be converted in \r\n by the spec.
So you could indeed do a simple explode("\r\n", $theContent) no matter the platform used.
P.S.
\r is only used on old(er) Macs. Nowadays Macs also use the *nix style line breaks (\n).
You could try preg_split, which will use a regular expression to split up the string. Within this regular expression you can match on all 3 new line variants.
$ArrayOfResults = preg_split( '/\r\n|\r|\n/', $YourStringToExplode );
It depends on what you want to achieve. If you are doing this eventually to display / format it as HTML, you can as well use the nl2br() function or possibly use str_replace like this:
$val = str_replace( array("\n","\r","\r\n"), '<br />', $val );
In case you want to just get an array of all lines, I would suggest you use all 3 characters ("\n","\r","\r\n") for explode

Problem Replacing Literal String \r\n With Line Break in PHP

I have a text file that has the literal string \r\n in it. I want to replace this with an actual line break (\n).
I know that the regex /\\r\\n/ should match it (I have tested it in Reggy), but I cannot get it to work in PHP.
I have tried the following variations:
preg_replace("/\\\\r\\\\n/", "\n", $line);
preg_replace("/\\\\[r]\\\\[n]/", "\n", $line);
preg_replace("/[\\\\][r][\\\\][n]/", "\n", $line);
preg_replace("/[\\\\]r[\\\\]n/", "\n", $line);
If I just try to replace the backslash, it works properly. As soon as I add an r, it finds no matches.
The file I am reading is encoded as UTF-16.
Edit:
I have also already tried using str_replace().
I now believe that the problem here is the character encoding of the file. I tried the following, and it did work:
$testString = "\\r\\n";
echo preg_replace("/\\\\r\\\\n/", "\n", $testString);
but it does not work on lines I am reading in from my file.
Save yourself the effort of figuring out the regex and try str_replace() instead:
str_replace('\r\n', "\n", $string);
Save yourself the effort of figuring out the regex and the escaping within double quotes:
$fixed = str_replace('\r\n', "\n", $line);
For what it is worth, preg_replace("/\\\\r\\\\n/", "\n", $line); should be fine. As a demonstration:
var_dump(preg_replace("/\\\\r\\\\n/", "NL", 'Cake is yummy\r\n\r\n'));
Gives: string(17) "Cake is yummyNLNL"
Also fine is: '/\\\r\\\n/' and '/\\\\r\\\\n/'
Important - if the above doesn't work, are you even sure literal \r\n is what you're trying to match?..
UTF-16 is the problem. If you're just working with raw the bytes, then you can use the full sequences for replacing:
$out = str_replace("\x00\x5c\x00\x72\x00\x5c\x00\x6e", "\x00\x0a", $in);
This assumes big-endian UTF-16, else swap the zero bytes to come after the non zeros:
$out = str_replace("\x5c\x00\x72\x00\x5c\x00\x6e\x00", "\x0a\x00", $in);
If that doesn't work, please post a byte-dump of your input file so we can see what it actually contains.
$result = preg_replace('/\\\\r\\\\n/', '\n', $subject);
The regex above replaces the type of line break normally used on windows (\r\n) with linux line breaks (\n).
References:
Difference between CR LF, LF and CR line break types?
Right way to escape backslash [ \ ] in PHP regex?
Regex Explanation
I always keep searching for this topic, and I always come back to a personal line I wrote.
It looks neat and its based on RegEx:
"/[\n\r]/"
PHP
preg_replace("/[\n\r]/",'\n', $string )
or
preg_replace("/[\n\r]/",$replaceStr, $string )

newline question

I want to detect a carriage return or a newline character when a user enters data into a textarea. What is the best way to handle this? I've tried str_replace with escape characters but carriage returns and newlines are not detected.
OK, say I type the following into a textarea:
The summer was hot this year
but next year is supposed to be cooler.
I want to detect the CRs. In this case, there is one.
Newlines could be \r, \r\n, or \n, depending on the client.
$input = preg_replace('/\r\n?/',"\n",$input)
will standardize all of your newlines to "\n" regardless of where they came from.
You can do it like this with str_replace:
function replace_newline($string) {
return (string)str_replace(array("\r", "\r\n", "\n"), '', $string);
}
There are several ways how new line is stored.
Some systems use only "\n" some "\r" and some both "\r\n". You need to check for both "\r" and "\n"
Try the following. It's always worked a charm for me.
You need to replace \n AND \r, it's because a linux system and a windows system use different characters for newlines.
$input = str_replace(array("\n","\r"),'',$input);
Or check for chr(10) and replace on that
Have you tried preg_replace because that can be used for regex replacements and then you can replace using \n or \r or any combination you require although I believe str_replace should also work fine.
function replace_newlines($string) {
return preg_replace('/\r\n|\r|\n/', '', $string);
}

Reliably split user-submitted textarea value on newlines

The string input comes from textarea where users are supposed to enter every single item on a new line.
When processing the form, it is easy to explode the textarea input into an array of single items like this:
$arr = explode("\n", $textareaInput);
It works fine but I am worried about it not working correctly in different systems (I can currently only test in Windows). I know newlines are represented as \r\n or as just \r across different platforms. Will the above line of code also work correctly under Linux, Solaris, BSD or other OS?
You can use preg_split to do that.
$arr = preg_split('/[\r\n]+/', $textareaInput);
It splits it on any combination of the \r or \n characters. You can also use \s to include any white-space char.
Edit
It occurred to me, that while the previous code works fine, it also removes empty lines. If you want to preserve the empty lines, you may want to try this instead:
$arr = preg_split('/(\r\n|[\r\n])/', $textareaInput);
It basically starts by looking for the Windows version \r\n, and if that fails it looks for either the old Mac version \r or the Unix version \n.
For example:
<?php
$text = "Windows\r\n\r\nMac\r\rUnix\n\nDone!";
$arr = preg_split('/(\r\n|[\r\n])/', $text);
print_r($arr);
?>
Prints:
Array
(
[0] => Windows
[1] =>
[2] => Mac
[3] =>
[4] => Unix
[5] =>
[6] => Done!
)
'\r' by itself as a line terminator is an old convention that's not really used anymore (not since OSX which is Unix based).
Your explode will be fine. Just trim off the '\r' in each resulting element for the Windows users.
$arr = preg_split( "/[\n\r]+/", $textareaInput );
You can normalize the input:
<?php
$foo = strtr($foo, array(
"\r\n" => "\n",
"\r" => "\n",
"\n" => "\n",
));
?>
Alternatively, you can explode with regular expressions:
<?php
$foo = preg_split ("/[\r\n]+/", $foo);
?>
Following code must do the job
<?php
$split = preg_split('/[\r\n]+/', $src);
foreach ($split as $k=>$string) {
$split[$k] = trim($string);
if (empty($split[$k]))
unset($split[$k]);
}
ksort($split);
$join = implode('', $split);
?>
to get string with newlinews completely stripped. It won't work correctly with JS though :(
The system agnostic technique with regex is involves the \R escape sequence.
PHP Documentation on Escape Sequences
It really is as simple as calling preg_split('~\R~', $textareaInput).
\R - line break: matches \n, \r and \r\n
Normalizing the input is a waste of time and effort if you are just going to explode on the replacement characters anyhow.
If you are worried about multiple consecutive newline characters in the string, you can just add the + quantifier afer \R.
If you want to trim whitespace characters from both sides of the strings in the resultant array, you can use ~\s*\R\s*~

Categories