Is a new line = \n OR \r\n? - php

I've seen many developers use different methods to split a string by new lines, but i'm confused which is the correct: \r\n OR \n only?

\n is used for Unix systems (including Linux, and OSX).
\r\n is mainly used on Windows.
\r is used on really old Macs.
PHP_EOL constant is used instead of these characters for portability between platforms.

The given answer is far from complete. In fact, it is so far from complete that it tends to lead the reader to believe that this answer is OS dependent when it isn't. It also isn't something which is programming language dependent (as some commentators have suggested). I'm going to add more information in order to make this more clear. First, lets give the list of current new line variations (as in, what they've been since 1999):
\r\n is only used on Windows Notepad (fixed as of 2018), the DOS command line (PowerShell handles \n only just fine; as well as most modern versions of DOS-era command line applications [such as more]), most of the Windows API written before ~2000, and some (older) Windows apps (mostly because they use the Windows API).
YMMV for .ps1, .bat, and .cmd scripts.
Oddly enough, TCP/IP uses \r\n. This means that most web protocols (including HTTP itself) use \r\n as noted by Raatje. There are definitely times that this matters for you as a PHP programmer.
\n is used for all other systems, applications and the content of webpages/email.
You'll notice that I've put most Windows apps in the \n group which may be slightly controversial but before you disagree with this statement, please grab a UNIX formatted text file and try it in 10 web friendly Windows applications of your choice (which aren't listed in my exceptions above). What percentage of them handled it just fine? You'll find that they (practically) all implement auto detection of line endings or just use \n because, while Windows may use \r\n, the Internet and most other OSes just use \n. Therefore, it is best practice for applications to use \n alone if you want your output to be Internet friendly.
PHP also defines a newline character called PHP_EOL. This constant is set to the OS specific newline string for the machine PHP is running on (\r\n for Windows and \n for everything else). This constant is not very useful for webpages and should be avoided for HTML output or for writing most text to files. It becomes VERY useful when we move to command line output from PHP applications because it will allow your application to output to a terminal Window in a consistent manner across all supported OSes.
If you want your PHP applications to work from any server they are placed on, the two biggest things to remember are that you should always just use \n unless it is terminal output (in which case you use PHP_EOL) and you should also ALWAYS use / for your path separator (not \). The third thing to look out for is drive letters in path strings, if allowed, which may be tricky depending on what you're doing.
The even longer explanation:
An application may choose to use whatever line endings it likes regardless of the default OS line ending style. If I want my text editor to print a newline every time it encounters a period that is no harder than using the \n to represent a newline because I'm interpreting the text as I display it anyway. IOW, I'm fiddling around with measuring the width of each character so it knows where to display the next so it is very simple to add a statement saying that if the current char is a period then perform a newline action (or if it is a \n then display a period).
Aside from the null terminator, no character code is sacred and when you write a text editor or viewer you are in charge of translating the bits in your file into glyphs (or carriage returns) on the screen. The only thing that distinguishes a control character such as the newline from other characters is that most font sets don't include them (meaning they don't have a visual representation available).
That being said, if you are working at a higher level of abstraction then you probably aren't making your own textbox controls. If this is the case then you're stuck with whatever line ending that control makes available to you. Even in this case it is a simple (and fairly quick) matter to automatically detect the line ending style of any string and adjust accordingly.

If you are programming in PHP, it is useful to split lines by \n and then trim() each line (provided you don't care about whitespace) to give you a "clean" line regardless.
foreach($line in explode("\n", $data))
{
$line = trim($line);
...
}

For php, \n should work for you!
http://php.net/manual/en/language.types.string.php

Related

Is there such a thing as an "intelligent" wordwrap() PHP function/library for beautifully typesetting plaintext?

I'm trying to properly "wordwrap" a given string into English plaintext. Take this example string:
This here is an example of what I'm talking about. Notice how I just talk nonsense on and on for no reason other than to push the 80-character line limit. And this is some more text, etc.
Please note: in the following examples, I have added underscores to visualize what I'm talking about. Naturally, the underscores are not added in reality. They are only here to make it clear what is happening.
If I simply blindly add a linebreak after each 80 chars, I get:
This here is an example of what I'm talking about. Notice how I just talk nonsen
se on and on for no reason other than to push the 80-character line limit. And t
his is some more text, etc._____________________________________________________
If I use the built-in wordwrap() function with 80 chars, I get:
This here is an example of what I'm talking about. Notice how I just talk_______
nonsense on and on for no reason other than to push the 80-character line limit.
And this is some more text, etc.________________________________________________
Neither of those look good or resemble a proper book or magazine, which either (depending on their age) have used software or humans to beautifully typeset them, like this:
This here is an example of what I'm talking about. Notice how I just talk nonse-
nse on and on for no reason other than to push the 80-character line limit. And_
this is some more text, etc.____________________________________________________
Notice how "nonsense" has neither been hard-cut or fully dropped on the next line. Instead, it has fully utilized the line minus one character for the dash, continuing on the next line. (As for the "And" in the end of the second line, it does have an whitespace after it, but only because there is only one more character left on that line.)
The rules for doing this kind of "intelligent wordwrapping" are language-specific, locale-specific (I think) and very complex. As such, it would be madness for me to attempt to code in all the rules manually.
I strongly suspect that there is some kind of mature, popular PHP library for doing precisely this, and I further suspect that it supports all kinds of languages/locales. However, I have been unable to find it myself.
It is not a requirement that it has to support "all kinds of languages/locales", but it would be nice. English with either US or UK locale would be sufficient for me to be happy at the moment.
I hope that I've been crystal-clear about what I'm asking!

special symbols in filename doesn't display correct in mPDF

I have code:
$mpdf = new mPDF();
$mpdf->WriteHTML('some html text');
return $mpdf->Output("123!##$%^&*()_+<><?:}{P}" . '.pdf', 'I');
But when I save document in filename instead symbols <>?: displays -----.
Can it be fixed?
First of all, this question has nothing to do with PDF generation. You want to create a file system object with a name that includes characters that have a special meaning in some shells:
< is the input redirecton operator
> is the output redirection operator
? is the any character wildcard
: is the Windows drive letter separator
And to you want to accomplish it through an additional layer you don't have control over (I assume a web browser).
Some file systems (not all) treat object names as raw byte strings and do not impose any condition. I recall being able to create files in an old Unix box that contained a * character and a line feed, after I read a book that explained such thing was possible. However, a file name goes though several software layers, many of which actually need to understand the name, and some of them will possibly impose additional restrictions to those of the file system itself. So, even if you manage to create the file, you might not be able to read it back later.
For this reason, the browser actively removes problematic characters. In some cases, it might be overzealous (: is safe on Unix) but it just tries to prevent potential issues (e.g. the Unix file is emailed or copied to a Windows share) and there's nothing you can do on the server to avoid that.

Simplest way to convert subscript numbers

we get book titles from different sources (library systems) (with possibly different encoding, but mostly utf8). These strings are shown in the web and via export to Endnote and RefWorks. RefWorks (windows Quotation system) does not accept any other encoding than ANSI.
In the RIS/Refworks export, activating the line
$smarty = iconv("UTF-8", "Windows-1252", $smarty);
Example string
Diphosphen-komplexes (CO) 5CrPhPPPhCr(CO) 5
does suddenly cut off everything after the first subscript char (the rectangles). These chars are also not correctly printed in HTML but this output is okay because nothing is cut off. In UTF-8 export file encoding nothing is cut off, too. Despite that, the Windows software can't read UTF-8.
The simplest solution would be to convert any subscript number to a regular number. Everything would work quite well then. But I could not find any simple solution to this. Working with hex codes is the only thing I could imagine. This solutions is also preferred for use in our Solr index.
Anybody knows a better solutions?
The example string contains Private Use code points such as U+E5F8. By definition, no standard assigns any meaning to them; their use is purely by private agreements. It is thus impossible to convert them to anything, or to do anything with them, without knowing or inferring the private agreements involved. Some systems use Private Use code points to represent some symbols that are assigned to those points in some special font. Knowing what that font is and inspecting it may thus help to find out the agreement.
The conversion would need to be coded separately, in an ad hoc manner, since there is an an hoc agreement involved.
“ANSI”, which here means windows-1252, does not contain any subscript characters. In the context of a chemical formula, replacing subscript digits by normal digits does not change the meaning, and the formula is understandable, though it looks unprofessional.
When converting to HTML format (or other rich text format), you can use normal digits wrapped in elements that cause subscript rendering (or otherwise style them). HTML has the sub element for this, but its implementations differ between browsers and tend to be a poor quality, so a better approach is to generate <span class=sub>...</span> and use CSS to set the vertical position and font size.

How to enable special characters "☠" in PHP?

My old page in linux worked perfectly, but when I try to change the server from Unix to Windows characters no longer work.
My old skull character "☠" "9760" is shown in box with hex definitions 26 and 20.
The box with 26 and 20 indicates lack of glyphs for the character U+2620 (decimal code 9760). This is one of the recommended ways of dealing with undisplayable characters according to the HTML 4.01 spec. So it indicates that the character has been properly recognized by the browser, it just cannot display it.
It sounds very odd that the OS of the server would affect this. It would be interesting to see the URLs of two versions that demonstrate such an effect. But changing browser or client computer may surely have an effect.
This is not entirely a browser problem, though. “Undisplayable” is relative, because browsers (especially IE) may fail to render a character, even though there is a glyph for it in available fonts. Therefore, you may wish to use a font-family setting with a suitable list of alternatives; to check out font coverage, see
http://www.fileformat.info/info/unicode/char/2620/fontsupport.htm

Go back up a line in a linux console?

I know I can go back the line and overwrite its contents with \r.
Now how can I go up into the previous line to change that?
Or is there even a way to print to a specific cursor location in the console window?
My goal is to create some self-refreshing multiline console app with PHP.
Use ANSI escape codes to move the cursor. For example: Esc [ 1 F. To put the Escape character in a string you'll need to specify its value numerically, for example "\x1B[1F"
As sujoy suggests, you can use PHP ncurses for a more abstract way to move the cursor.
Whilst most "consoles" allow ANSI escape codes, other sorts of terminal use different character sequences, ncurses provides a standardised API that is terminal independent. Have a quick look at /etc/termcap (and then man terminfo) if you are interested.
Update: Lars Wirzenius' answer has a useful summary of the background. Some years ago I also wrote a short article on terminals.
The Linux virtual consoles emulate an old-time display terminal, although not perfectly. See Wikipedia on VT-100 for an example of the hardware.
These terminals read data from a serial port, and displayed it on the screen. They also looked for special bytes in the input stream from the serial port and acted upon them in other ways. For example, the newline character ('\n', byte value 10) would go to the beginning of the next line, and the carriage return character ('\r', byte value 13) would go the beginning of the current line.
More interestingly, an ASCII ESC byte (27) would start a command sequence which could to almost anything to the cursor or display. One such sequence might move the cursor to the top left of the screen, another to a given row and column. A third one might clear the screen, and a fourth one might make text be displayed in reverse colors.
Every manufacturer of terminals would invent their own command sequences (and they didn't always start with ESC either), and then change them depending on what they could make new versions of their hardware do. If a manufacturer added colors or simple graphics, those resulted in new sequences.
Adapating every application to every terminal and every change to the command sequences would have been a big task. Compare it with adapting every web application to a new browser version.
As usual, the solution is to add a layer of abstraction. In Unix, the initial abstraction was called termcap, and consisted of the file /etc/termcap, and a library to read the file. The file would specify the actual command sequences to send for each logical operation for each terminal model. So a vt102 terminal model would map the operation "clear the screen" to the \033[2J. This allowed application programmers to think in terms of the logical operations, which was much simpler.
Of course, not simple enough... The termcap library was not as good as it might have been, so two other libraries were developd: curses provided a higher abstraction level, including user input, and terminfo made the terminal definitions and their use by programmers easier.
In modern times, ncurses is a free re-implementation of curses and terminfo has pretty much replaced termcap completely. Also, ANSI has defined some "standard" sequences, based on the Digital terminals, and almost every terminal emulator uses those, at least mostly, and the Linux virtual console is one of them. Very few people have actual physical terminals anymore.
For what you're trying to do, ncurses or the tput command may be most useful. Or you may decide that just clearing the whole screen (see clear(1)) and writing output then is easiest.
My goal is to create some self-refreshing multiline console app with
PHP
For what you are trying to achieve ncurses is the way to go.
You shoud read about ncurses. In shell, you can go one line up by:
tput cuu1
See man terminfo for more options.
But executing shell command to move cursor around is quite desperate.
You just you the up and down arrows on the keyboard to scroll through console history but there is also the history command. Find out more using man history

Categories