Explode text into array as per paragraph - php

I have the following text:
$test = 'Test This is first line
Test:123
This is Test';
I want to explode this string to an array of paragraphs. I wrote the following code but it is not working:
$array = explode('\n\n', $test);
Any idea what I'm missing here?

You might be on Windows which uses \r\n instead of \n. You could use a regex to make it universal with preg_split():
$array = preg_split('#(\r\n?|\n)+#', $test);
Pattern explanation:
( : start matching group 1
\r\n?|\n : match \r\n, \r or \n
) : end matching group 1
+ : repeat one or more times
If you want to split by 2 newlines, then replace + by {2,}.
Update: you might use:
$array = preg_split('#\R+#', $test);
This extensive answer covers the meaning of \R. Note that this is only supported in PCRE/perl. So in a sense, it's less cross-flavour compatible.

Your code
$array = explode('\n\n', $test);
should have \n\n enclosed in double quotes:
$array = explode("\n\n", $test);
Using single quotes, it looks through the variable $test for a literal \n\n. With double quotes, it looks for the evaluated values of \n\n which are two carriage returns.
Also, note that the end of line depends on the host operating system. Windows uses \r\n instead of \n. You can get the end of line for the operating system by using the predefined constant PHP_EOL.

Try double quotes
$array = explode("\n\n", $test);

did you have try this ?
$array = explode("\n", $test);

The easiest way to get this text into an array like you describe would be:
preg_match_all('/.+/',$string, $array);
Since /./ matches any char, except for line terminators, and the + is greedy, it'll match as many chars as possible, until a new-line is encountered.
Using preg_match_all ensures this is repeated for each line, too. When I tried this, the output looked like this:
array (
0 =>
array (
0 => '$test = \'Test This is first line',
1 => 'Test:123',
2 => 'This is Test\';',
),
)
Also note that line-feeds are different, depending on the environment (\n for *NIX systems, compared to \r\n for windows, or in some cases a simple \r). Perhaps you might want to try explode(PHP_EOL, $text);, too

You need to use double quotes in your code, such that the \n\n is actually evaluated as two lines. Look below:
'Paragraph 1\n\nParagraph 2' =
Paragraph 1\n\nParagraph 2
Whereas:
"Paragraph 1\n\nParagraph 2" =
Paragraph 1
Paragraph 2
Also, Windows systems use \r\n\r\n instead of \n\n. You can detect which line endings the system is using with:
PHP_EOL
So, your final code would be:
$paragraphs = explode(PHP_EOL, $text);

Related

How to remove every second occurrence within a string?

Basically, I have a string that I need to search through and remove every SECOND occurrence within it.
Here is what my string looks like ($s):
question1,answer1,answer2,answer3,answer4
question2,answer1,answer2,answer3,answer4
question3,answer1,answer2,answer3,answer4
Here is what my code currently looks like:
$toRemove = array("\n");
$finalString = str_replace($toRemove, "", $s);
As you can see, each line within my s string contains two \n between them. I would like to search through my string and only replace every SECOND \n so that my string ends up being:
question1,answer1,answer2,answer3,answer4
question2,answer1,answer2,answer3,answer4
question3,answer1,answer2,answer3,answer4
Is this possible? If so, how can I do it?
In your specific case, you may want to just replace two newlines with one newline:
$string = str_replace("\n\n", "\n", $string);
More complicated regex solutions could collapse any number of concurrent newlines:
preg_replace("/\n+/", "\n", "foo\n\nbar\n\n\n\n\nblee\nnope");
Adam's answer is correct for UNIX like systems but in Windows you can have different line endings. My Regex is a little bit rusty but I think this should work for UNIX and Windows.
$string = preg_replace('/[\n\r]{2}/', '\n', $string); Replace exact 2 line endings
$string = preg_replace('/[\n\r]+/', '\n', $string); Replace 1 or more line endings

regular expressions for one sentence per line

I am trying to take a text area value and run it through regular expression to split it to lines.
so if someone wrote a line then enter and another line and enter the i will have an array with each line per array value
The expression I've came up with so far is :
(.+?)\n|\G(.*)
and this is how i use it(from a website i use to test expressions http://myregextester.com/)
$sourcestring="
this is a sentense yeaa
interesting sentense
yet another sentese
";
preg_match_all('/(.+?)\n|\G(.*)/',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
however there is 1 object in the array that always empty and i am trying to find a way to get rid of it.
Thanks in advanced.
You don't need a regex for this, just use explode(), like so:
$lines = explode( "\n", trim( $input));
Now each line of the user's $input will be a single array entry in $lines.
This will do and get rid of the empty lines in the beginning and end of the array
explode("\n", trim($sourcestring));
See example: http://viper-7.com/pNqtvV
There are various types of newlines. In HTML form context you'll typically receive CR LF for line endings. A dumb explode will do, but a regex will catch all variations if you use \R. Thus \r\n and \n or \r and others will be processed by:
$lines = preg_split(':\R:', $text);
preg_split() is the equivalent to PHPs explode(). So you don't need to use preg_match_all.

Turning multi-line string into multi-element array using regular expressions in PHP

I need to split the following string and put each new line into a new array element.
this is line a.(EOL chars = '\r\n' or '\n')
(EOL chars)
this is line b.(EOL chars)
this is line c.(EOL chars)
this is the last line d.(OPTIONAL EOL chars)
(Note that the last line might not have any EOL characters present. The string also sometimes contains only 1 line, which is by definition the last one.)
The following rules must be followed:
Empty lines (like the second line) should be discarded and not put
into the array.
EOL chars should not be included, because otherwise
my string comparisons fail.
So this should result in the following array:
[0] => "this is line a."
[1] => "this is line b."
[2] => "this is line c."
[3] => "this is the last line d."
I tried doing the following:
$matches = array();
preg_match_all('/^(.*)$/m', $str, $matches);
return $matches[1];
$matches[1] indeed contains each new line, but:
Empty lines are included as well
It seems that a '\r' character gets smuggled in anyway at the end of the strings in the array. I suspect this has something to do with the regex range '.' which includes everything except '\n'.
Anyway, I've been playing around with '\R' and whatnot, but I just can't find a good regex pattern that follows the two rules I outlined above. Any help please?
Just use preg_split() to split on the regular expression:
// Split on \n, \r is optional..
// The last element won't need an EOL.
$array = preg_split("/\r?\n/", $string);
Note, you might also want to trim($string) if there is a trailing newline, so you don't end up with an extra empty array element.
There is a function just for this - file()
I think preg_split would be the way to go... You can use an appropriate regexp to use any EOL character as separator.
Something like the following (the regexp needs to be a bit more elaborate):
$array = preg_split('/[\n\r]+/', $string);
Hope that helps,
Use preg_split function:
$array = preg_split('/[\r\n]+/', $string);

split function not working on new lines

Hi I have simple split script and I am splitting a string on new line that has many new lines characters in it. Here is the code :
<?php
$string = $argv[1]; // CASE 1: COMMAND LINE ARGUMENT.
echo "String is : $string\n";
//$string = 'Hello Shakir\nOkay Shakir\nHow are you ?'; //CASE2: SINGLE QUOTE.
$string = "Hello Shakir\nOkay Shakir\nHow are you ?"; //CASE 3: DOUBLE QUOTE.
$lines = array();
$lines = split("\n", $string);
foreach ( $lines as $line ) {
echo "line is : $line\n";
//var_dump($line);
}
?>
It works fine when I use CASE 3 in the code, but it doesnt work when I use either CASE1 or CASE 2 (Only CASE 3 works fine). Can anybody please shed some light on this ?
This is how I run it on command line(linux machine) :
php my_script.php "Hello Shakir\nOkay Shakir\nHow are you ?"
In this case when I print $argv[1], it prints the entire string but it treats it same as CASE2 (with single quotes).
UPDATE :
Many of you have said what the cause of the issue is and not the answer to it. However, knowing the cause helped me fix it. So the answer is :
ANSWER :
Instead of using \n in double quotes ("\n"), use single quotes ('\n') :
$lines = split('\n', $string);
OR
$lines = explode('\n', $string);
However split counts '\' also as a character and I dont know why. But explode is correct. Since split is deprecated I dont have done much research on this.
Thank you for all who let me know that split is deprecated.
CASE1 and CASE2 will not work for a simple reason, because \n is evaluated as a literal \n and not a newline character.
Only CASE3, with the double quotes, will evaluate \n as a newline.
Also, the function split() is deprecated. Try using explode() instead.
That's because '\n' not outputs new line, and not interpritate as new line, so in CMD too you don't pass new lines;
Also, split() is deprecated, use explode();
The big difference between using single quotes '' and double quotes "" are for automatic replacement inside of strings.
Using "" will enable replacement of variables and usage of escape sequences while '' doesn't support this.
$name = 'Mathieu';
$case1 = "Hi this is $name speaking\nPleased to meet you!";
echo $case1;
//Will result in
Hi this is Mathieu speaking
Pleased to meet you!
While using single quotes will yield:
$name = 'Mathieu';
$case1 = 'Hi this is $name speaking\nPleased to meet you!';
echo $case1;
//Will result in
Hi this is $name speaking\nPleased to meet you!
All escape sequences possible are:
\n Line feed (dec 13)
\r Carriage return (dec 10)
\t Tab (dec 8)
Relative to your question about line feeds, note that \n, \r, \r\n are using in different combination depending on the OS and information coming from a Windows OS usually features \r\n while linux only has \n. MacOS used to or still features only \r i think, not sure.
Single-quoted strings are not supposed to expand all the escape strings like /n.

Reliably split user-submitted textarea value on newlines

The string input comes from textarea where users are supposed to enter every single item on a new line.
When processing the form, it is easy to explode the textarea input into an array of single items like this:
$arr = explode("\n", $textareaInput);
It works fine but I am worried about it not working correctly in different systems (I can currently only test in Windows). I know newlines are represented as \r\n or as just \r across different platforms. Will the above line of code also work correctly under Linux, Solaris, BSD or other OS?
You can use preg_split to do that.
$arr = preg_split('/[\r\n]+/', $textareaInput);
It splits it on any combination of the \r or \n characters. You can also use \s to include any white-space char.
Edit
It occurred to me, that while the previous code works fine, it also removes empty lines. If you want to preserve the empty lines, you may want to try this instead:
$arr = preg_split('/(\r\n|[\r\n])/', $textareaInput);
It basically starts by looking for the Windows version \r\n, and if that fails it looks for either the old Mac version \r or the Unix version \n.
For example:
<?php
$text = "Windows\r\n\r\nMac\r\rUnix\n\nDone!";
$arr = preg_split('/(\r\n|[\r\n])/', $text);
print_r($arr);
?>
Prints:
Array
(
[0] => Windows
[1] =>
[2] => Mac
[3] =>
[4] => Unix
[5] =>
[6] => Done!
)
'\r' by itself as a line terminator is an old convention that's not really used anymore (not since OSX which is Unix based).
Your explode will be fine. Just trim off the '\r' in each resulting element for the Windows users.
$arr = preg_split( "/[\n\r]+/", $textareaInput );
You can normalize the input:
<?php
$foo = strtr($foo, array(
"\r\n" => "\n",
"\r" => "\n",
"\n" => "\n",
));
?>
Alternatively, you can explode with regular expressions:
<?php
$foo = preg_split ("/[\r\n]+/", $foo);
?>
Following code must do the job
<?php
$split = preg_split('/[\r\n]+/', $src);
foreach ($split as $k=>$string) {
$split[$k] = trim($string);
if (empty($split[$k]))
unset($split[$k]);
}
ksort($split);
$join = implode('', $split);
?>
to get string with newlinews completely stripped. It won't work correctly with JS though :(
The system agnostic technique with regex is involves the \R escape sequence.
PHP Documentation on Escape Sequences
It really is as simple as calling preg_split('~\R~', $textareaInput).
\R - line break: matches \n, \r and \r\n
Normalizing the input is a waste of time and effort if you are just going to explode on the replacement characters anyhow.
If you are worried about multiple consecutive newline characters in the string, you can just add the + quantifier afer \R.
If you want to trim whitespace characters from both sides of the strings in the resultant array, you can use ~\s*\R\s*~

Categories