I wanted to know if there's a way to manipulate line breaks within PHP. Like, to explicitly tell what kind of line break to select (LF, CRLF...) for using in an explode() function for instance.
it would be something like that:
$rows = explode('<LF>', $list);
//<LF> here would be the line break
anyone can help? thanks (:
LF and CR are just abbreviations for the characters with the code point 0x0A (LINE FEED) and 0x0D (CARRIAGE RETURN) in ASCII. You can either write them literally or use appropriate escape sequences:
"\x0A" "\n" // LF
"\x0D" "\r" // CR
Remember using the double quotes as single quotes do only know the escape sequences \\ and \'.
CRLF would then just be the concatenation of both characters. So:
$rows = explode("\r\n", $list);
If you want to split at both CR and LF you can do a split using a regular expression:
$rows = preg_split("/[\r\n]/", $list);
And to skip empty lines (i.e. sequences of more than just one line break characters):
$rows = preg_split("/[\r\n]+/", $list);
Some possibilities I can think of, depending on your needs:
Pick an EOL style and specify the exact character(s): "\r\n"
Choose the EOL of the platform PHP runs on and use the PHP_EOL constant
Use regular expressions: preg_split('/[\r\n]+/', ...)
Use a function that can autodetect line endings: file()
Normalize the input string before exploding:
$text = strtr($text, array(
"\r\n" => PHP_EOL,
"\r" => PHP_EOL,
"\n" => PHP_EOL,
));
Related
My script reads a file containing a replacement string and then makes a preg_replace of spaces in some text with the replacement. The idea is that the replacement file should contain any valid regex replacement.
When the replacement file contains a simple string like e.g. "xyz", it works fine. But when it contains "\n", I would like to treat it as a new line, but it doesn't work. The spaces in text are replaced literally by "\n". Here is the script:
$c = file_get_contents('replacements.txt');
$s = preg_replace('/ /', $c, 'some text');
file_put_contents('output.txt', $s);
The output.txt contains "some\ntext" when viewed in text editor.
So I added a simple if statement:
if ($c == '\n') {
$c = "\n";
}
And now it works. But is there a more general way to deal with this problem, i.e. get the replacement string from file interpreted as a real regex replacement? Because in the future it might be a more complicated replacement.
You may have indeed similar issues with other escape sequences, like \t, \r, \x10, ... etc.
I would suggest this solution, to turn the string into a version that has these characters interpreted.
$c = json_decode('"'.str_replace('"', '\"', $c).'"');
... then the replace will work as intended.
I have the following text:
$test = 'Test This is first line
Test:123
This is Test';
I want to explode this string to an array of paragraphs. I wrote the following code but it is not working:
$array = explode('\n\n', $test);
Any idea what I'm missing here?
You might be on Windows which uses \r\n instead of \n. You could use a regex to make it universal with preg_split():
$array = preg_split('#(\r\n?|\n)+#', $test);
Pattern explanation:
( : start matching group 1
\r\n?|\n : match \r\n, \r or \n
) : end matching group 1
+ : repeat one or more times
If you want to split by 2 newlines, then replace + by {2,}.
Update: you might use:
$array = preg_split('#\R+#', $test);
This extensive answer covers the meaning of \R. Note that this is only supported in PCRE/perl. So in a sense, it's less cross-flavour compatible.
Your code
$array = explode('\n\n', $test);
should have \n\n enclosed in double quotes:
$array = explode("\n\n", $test);
Using single quotes, it looks through the variable $test for a literal \n\n. With double quotes, it looks for the evaluated values of \n\n which are two carriage returns.
Also, note that the end of line depends on the host operating system. Windows uses \r\n instead of \n. You can get the end of line for the operating system by using the predefined constant PHP_EOL.
Try double quotes
$array = explode("\n\n", $test);
did you have try this ?
$array = explode("\n", $test);
The easiest way to get this text into an array like you describe would be:
preg_match_all('/.+/',$string, $array);
Since /./ matches any char, except for line terminators, and the + is greedy, it'll match as many chars as possible, until a new-line is encountered.
Using preg_match_all ensures this is repeated for each line, too. When I tried this, the output looked like this:
array (
0 =>
array (
0 => '$test = \'Test This is first line',
1 => 'Test:123',
2 => 'This is Test\';',
),
)
Also note that line-feeds are different, depending on the environment (\n for *NIX systems, compared to \r\n for windows, or in some cases a simple \r). Perhaps you might want to try explode(PHP_EOL, $text);, too
You need to use double quotes in your code, such that the \n\n is actually evaluated as two lines. Look below:
'Paragraph 1\n\nParagraph 2' =
Paragraph 1\n\nParagraph 2
Whereas:
"Paragraph 1\n\nParagraph 2" =
Paragraph 1
Paragraph 2
Also, Windows systems use \r\n\r\n instead of \n\n. You can detect which line endings the system is using with:
PHP_EOL
So, your final code would be:
$paragraphs = explode(PHP_EOL, $text);
So evidently:
\n = CR (Carriage Return) // Used as a new line character in Unix
\r = LF (Line Feed) // Used as a new line character in Mac OS
\r\n = CR + LF // Used as a new line character in Windows
(char)13 = \n = CR // Same as \n
but then I also heard that for HTML textarea, when it's submitted and parsed by a php script, all new lines are converted to \r\n regardless of the platform
is this true and can I rely on this or am I completely mistaken?
ie. if I wanna do explode() based on a new line, can I use '\r\n' as the delimiter regardless of whether or not the user is using mac, pc, etc
All newlines should be converted in \r\n by the spec.
So you could indeed do a simple explode("\r\n", $theContent) no matter the platform used.
P.S.
\r is only used on old(er) Macs. Nowadays Macs also use the *nix style line breaks (\n).
You could try preg_split, which will use a regular expression to split up the string. Within this regular expression you can match on all 3 new line variants.
$ArrayOfResults = preg_split( '/\r\n|\r|\n/', $YourStringToExplode );
It depends on what you want to achieve. If you are doing this eventually to display / format it as HTML, you can as well use the nl2br() function or possibly use str_replace like this:
$val = str_replace( array("\n","\r","\r\n"), '<br />', $val );
In case you want to just get an array of all lines, I would suggest you use all 3 characters ("\n","\r","\r\n") for explode
I have a text file that has the literal string \r\n in it. I want to replace this with an actual line break (\n).
I know that the regex /\\r\\n/ should match it (I have tested it in Reggy), but I cannot get it to work in PHP.
I have tried the following variations:
preg_replace("/\\\\r\\\\n/", "\n", $line);
preg_replace("/\\\\[r]\\\\[n]/", "\n", $line);
preg_replace("/[\\\\][r][\\\\][n]/", "\n", $line);
preg_replace("/[\\\\]r[\\\\]n/", "\n", $line);
If I just try to replace the backslash, it works properly. As soon as I add an r, it finds no matches.
The file I am reading is encoded as UTF-16.
Edit:
I have also already tried using str_replace().
I now believe that the problem here is the character encoding of the file. I tried the following, and it did work:
$testString = "\\r\\n";
echo preg_replace("/\\\\r\\\\n/", "\n", $testString);
but it does not work on lines I am reading in from my file.
Save yourself the effort of figuring out the regex and try str_replace() instead:
str_replace('\r\n', "\n", $string);
Save yourself the effort of figuring out the regex and the escaping within double quotes:
$fixed = str_replace('\r\n', "\n", $line);
For what it is worth, preg_replace("/\\\\r\\\\n/", "\n", $line); should be fine. As a demonstration:
var_dump(preg_replace("/\\\\r\\\\n/", "NL", 'Cake is yummy\r\n\r\n'));
Gives: string(17) "Cake is yummyNLNL"
Also fine is: '/\\\r\\\n/' and '/\\\\r\\\\n/'
Important - if the above doesn't work, are you even sure literal \r\n is what you're trying to match?..
UTF-16 is the problem. If you're just working with raw the bytes, then you can use the full sequences for replacing:
$out = str_replace("\x00\x5c\x00\x72\x00\x5c\x00\x6e", "\x00\x0a", $in);
This assumes big-endian UTF-16, else swap the zero bytes to come after the non zeros:
$out = str_replace("\x5c\x00\x72\x00\x5c\x00\x6e\x00", "\x0a\x00", $in);
If that doesn't work, please post a byte-dump of your input file so we can see what it actually contains.
$result = preg_replace('/\\\\r\\\\n/', '\n', $subject);
The regex above replaces the type of line break normally used on windows (\r\n) with linux line breaks (\n).
References:
Difference between CR LF, LF and CR line break types?
Right way to escape backslash [ \ ] in PHP regex?
Regex Explanation
I always keep searching for this topic, and I always come back to a personal line I wrote.
It looks neat and its based on RegEx:
"/[\n\r]/"
PHP
preg_replace("/[\n\r]/",'\n', $string )
or
preg_replace("/[\n\r]/",$replaceStr, $string )
The string input comes from textarea where users are supposed to enter every single item on a new line.
When processing the form, it is easy to explode the textarea input into an array of single items like this:
$arr = explode("\n", $textareaInput);
It works fine but I am worried about it not working correctly in different systems (I can currently only test in Windows). I know newlines are represented as \r\n or as just \r across different platforms. Will the above line of code also work correctly under Linux, Solaris, BSD or other OS?
You can use preg_split to do that.
$arr = preg_split('/[\r\n]+/', $textareaInput);
It splits it on any combination of the \r or \n characters. You can also use \s to include any white-space char.
Edit
It occurred to me, that while the previous code works fine, it also removes empty lines. If you want to preserve the empty lines, you may want to try this instead:
$arr = preg_split('/(\r\n|[\r\n])/', $textareaInput);
It basically starts by looking for the Windows version \r\n, and if that fails it looks for either the old Mac version \r or the Unix version \n.
For example:
<?php
$text = "Windows\r\n\r\nMac\r\rUnix\n\nDone!";
$arr = preg_split('/(\r\n|[\r\n])/', $text);
print_r($arr);
?>
Prints:
Array
(
[0] => Windows
[1] =>
[2] => Mac
[3] =>
[4] => Unix
[5] =>
[6] => Done!
)
'\r' by itself as a line terminator is an old convention that's not really used anymore (not since OSX which is Unix based).
Your explode will be fine. Just trim off the '\r' in each resulting element for the Windows users.
$arr = preg_split( "/[\n\r]+/", $textareaInput );
You can normalize the input:
<?php
$foo = strtr($foo, array(
"\r\n" => "\n",
"\r" => "\n",
"\n" => "\n",
));
?>
Alternatively, you can explode with regular expressions:
<?php
$foo = preg_split ("/[\r\n]+/", $foo);
?>
Following code must do the job
<?php
$split = preg_split('/[\r\n]+/', $src);
foreach ($split as $k=>$string) {
$split[$k] = trim($string);
if (empty($split[$k]))
unset($split[$k]);
}
ksort($split);
$join = implode('', $split);
?>
to get string with newlinews completely stripped. It won't work correctly with JS though :(
The system agnostic technique with regex is involves the \R escape sequence.
PHP Documentation on Escape Sequences
It really is as simple as calling preg_split('~\R~', $textareaInput).
\R - line break: matches \n, \r and \r\n
Normalizing the input is a waste of time and effort if you are just going to explode on the replacement characters anyhow.
If you are worried about multiple consecutive newline characters in the string, you can just add the + quantifier afer \R.
If you want to trim whitespace characters from both sides of the strings in the resultant array, you can use ~\s*\R\s*~