I know I can go back the line and overwrite its contents with \r.
Now how can I go up into the previous line to change that?
Or is there even a way to print to a specific cursor location in the console window?
My goal is to create some self-refreshing multiline console app with PHP.
Use ANSI escape codes to move the cursor. For example: Esc [ 1 F. To put the Escape character in a string you'll need to specify its value numerically, for example "\x1B[1F"
As sujoy suggests, you can use PHP ncurses for a more abstract way to move the cursor.
Whilst most "consoles" allow ANSI escape codes, other sorts of terminal use different character sequences, ncurses provides a standardised API that is terminal independent. Have a quick look at /etc/termcap (and then man terminfo) if you are interested.
Update: Lars Wirzenius' answer has a useful summary of the background. Some years ago I also wrote a short article on terminals.
The Linux virtual consoles emulate an old-time display terminal, although not perfectly. See Wikipedia on VT-100 for an example of the hardware.
These terminals read data from a serial port, and displayed it on the screen. They also looked for special bytes in the input stream from the serial port and acted upon them in other ways. For example, the newline character ('\n', byte value 10) would go to the beginning of the next line, and the carriage return character ('\r', byte value 13) would go the beginning of the current line.
More interestingly, an ASCII ESC byte (27) would start a command sequence which could to almost anything to the cursor or display. One such sequence might move the cursor to the top left of the screen, another to a given row and column. A third one might clear the screen, and a fourth one might make text be displayed in reverse colors.
Every manufacturer of terminals would invent their own command sequences (and they didn't always start with ESC either), and then change them depending on what they could make new versions of their hardware do. If a manufacturer added colors or simple graphics, those resulted in new sequences.
Adapating every application to every terminal and every change to the command sequences would have been a big task. Compare it with adapting every web application to a new browser version.
As usual, the solution is to add a layer of abstraction. In Unix, the initial abstraction was called termcap, and consisted of the file /etc/termcap, and a library to read the file. The file would specify the actual command sequences to send for each logical operation for each terminal model. So a vt102 terminal model would map the operation "clear the screen" to the \033[2J. This allowed application programmers to think in terms of the logical operations, which was much simpler.
Of course, not simple enough... The termcap library was not as good as it might have been, so two other libraries were developd: curses provided a higher abstraction level, including user input, and terminfo made the terminal definitions and their use by programmers easier.
In modern times, ncurses is a free re-implementation of curses and terminfo has pretty much replaced termcap completely. Also, ANSI has defined some "standard" sequences, based on the Digital terminals, and almost every terminal emulator uses those, at least mostly, and the Linux virtual console is one of them. Very few people have actual physical terminals anymore.
For what you're trying to do, ncurses or the tput command may be most useful. Or you may decide that just clearing the whole screen (see clear(1)) and writing output then is easiest.
My goal is to create some self-refreshing multiline console app with
PHP
For what you are trying to achieve ncurses is the way to go.
You shoud read about ncurses. In shell, you can go one line up by:
tput cuu1
See man terminfo for more options.
But executing shell command to move cursor around is quite desperate.
You just you the up and down arrows on the keyboard to scroll through console history but there is also the history command. Find out more using man history
Related
I'm trying to properly "wordwrap" a given string into English plaintext. Take this example string:
This here is an example of what I'm talking about. Notice how I just talk nonsense on and on for no reason other than to push the 80-character line limit. And this is some more text, etc.
Please note: in the following examples, I have added underscores to visualize what I'm talking about. Naturally, the underscores are not added in reality. They are only here to make it clear what is happening.
If I simply blindly add a linebreak after each 80 chars, I get:
This here is an example of what I'm talking about. Notice how I just talk nonsen
se on and on for no reason other than to push the 80-character line limit. And t
his is some more text, etc._____________________________________________________
If I use the built-in wordwrap() function with 80 chars, I get:
This here is an example of what I'm talking about. Notice how I just talk_______
nonsense on and on for no reason other than to push the 80-character line limit.
And this is some more text, etc.________________________________________________
Neither of those look good or resemble a proper book or magazine, which either (depending on their age) have used software or humans to beautifully typeset them, like this:
This here is an example of what I'm talking about. Notice how I just talk nonse-
nse on and on for no reason other than to push the 80-character line limit. And_
this is some more text, etc.____________________________________________________
Notice how "nonsense" has neither been hard-cut or fully dropped on the next line. Instead, it has fully utilized the line minus one character for the dash, continuing on the next line. (As for the "And" in the end of the second line, it does have an whitespace after it, but only because there is only one more character left on that line.)
The rules for doing this kind of "intelligent wordwrapping" are language-specific, locale-specific (I think) and very complex. As such, it would be madness for me to attempt to code in all the rules manually.
I strongly suspect that there is some kind of mature, popular PHP library for doing precisely this, and I further suspect that it supports all kinds of languages/locales. However, I have been unable to find it myself.
It is not a requirement that it has to support "all kinds of languages/locales", but it would be nice. English with either US or UK locale would be sufficient for me to be happy at the moment.
I hope that I've been crystal-clear about what I'm asking!
When using popen in php, is there a way to preserve the colored output a program might generate? Is there maybe a way I can tell the shell to print all color escape sequences, instead of resolving them?
That depends on the program you are calling. Usually, if a program supports coloured output, it would ask the OS, "am I running on a terminal?" If yes, then it outputs colour codes. If not, it won't. If you run that program through popen(), then the OS would say "no, you're not running on a terminal" and the program would choose not to output the colour codes (because they would be confusing in the captured output).
Some programs may have an option to force the output of colour codes even if output is not being written to a terminal. However, that is not something you can force externally if the program doesn't already have a way to do it.
I'm a Vim user who (due to my Python background) has grown accustomed to spaces instead of tabs. I'm working on a PHP project with another developer who uses Windows IDEs and really wants to use tabs. There really doesn't seem to be any hard and fast style guide for PHP that prefers a specific style, so I'm kind of stuck needing to deal with the tabs.
Is there some way, either through Git or Vim, that I can work with tabbed code and convert it to spaces while I'm editing it? Then perhaps on git add/commit it could be converted back to tabs?
The one really important thing is that the tracked content be standardized. You and the other developer are just going to have to agree on something there. Whichever of you wants to do something besides the agreed-upon standard is may end up with mixed results. It's not always possible to cleanly convert back and forth.
I would really recommend just dealing with it. I have my own opinion about tabs and spaces, and I've worked on code using each. The best thing to do is just to set up your editor to match the standardized style, and go with it. In vim, turn off expandtab, set tabstop to what you like.
If you still want to try, there are two primary ways:
Use autocommands in Vim to convert on read/write. You'd probably need BufReadPost, BufWritePre, and BufWritePost. (For writing, you convert to the standard, let Vim write it, then convert back to the way you like to edit.) Make sure tabstop is set the way you like, then something like this (untested):
set tabstop=4
autocmd BufReadPost * set expandtab | retab
autocmd BufWritePre * set noexpandtab | retab!
autocmd BufWritePost * set expandtab | retab
The * is the filepattern that this will apply to; you may have to mess with that, or only add the autocommands for files within a certain directory, to make sure this doesn't happen for everything you edit. Note that this is dangerous; it will for example replace literal tab characters inside strings.
Use Git's smudge/clean filters. You can read about them in man gitattributes, or in Pro Git. You could use those to convert for editing, then back to the standard for committing. If there's never any weird indentation, it could be as simple as changing leading tabs to some number of spaces, and leading spaces to a fraction of that number of tabs. Do it with sed/perl/indent, whatever you're comfortable with.
I've seen many developers use different methods to split a string by new lines, but i'm confused which is the correct: \r\n OR \n only?
\n is used for Unix systems (including Linux, and OSX).
\r\n is mainly used on Windows.
\r is used on really old Macs.
PHP_EOL constant is used instead of these characters for portability between platforms.
The given answer is far from complete. In fact, it is so far from complete that it tends to lead the reader to believe that this answer is OS dependent when it isn't. It also isn't something which is programming language dependent (as some commentators have suggested). I'm going to add more information in order to make this more clear. First, lets give the list of current new line variations (as in, what they've been since 1999):
\r\n is only used on Windows Notepad (fixed as of 2018), the DOS command line (PowerShell handles \n only just fine; as well as most modern versions of DOS-era command line applications [such as more]), most of the Windows API written before ~2000, and some (older) Windows apps (mostly because they use the Windows API).
YMMV for .ps1, .bat, and .cmd scripts.
Oddly enough, TCP/IP uses \r\n. This means that most web protocols (including HTTP itself) use \r\n as noted by Raatje. There are definitely times that this matters for you as a PHP programmer.
\n is used for all other systems, applications and the content of webpages/email.
You'll notice that I've put most Windows apps in the \n group which may be slightly controversial but before you disagree with this statement, please grab a UNIX formatted text file and try it in 10 web friendly Windows applications of your choice (which aren't listed in my exceptions above). What percentage of them handled it just fine? You'll find that they (practically) all implement auto detection of line endings or just use \n because, while Windows may use \r\n, the Internet and most other OSes just use \n. Therefore, it is best practice for applications to use \n alone if you want your output to be Internet friendly.
PHP also defines a newline character called PHP_EOL. This constant is set to the OS specific newline string for the machine PHP is running on (\r\n for Windows and \n for everything else). This constant is not very useful for webpages and should be avoided for HTML output or for writing most text to files. It becomes VERY useful when we move to command line output from PHP applications because it will allow your application to output to a terminal Window in a consistent manner across all supported OSes.
If you want your PHP applications to work from any server they are placed on, the two biggest things to remember are that you should always just use \n unless it is terminal output (in which case you use PHP_EOL) and you should also ALWAYS use / for your path separator (not \). The third thing to look out for is drive letters in path strings, if allowed, which may be tricky depending on what you're doing.
The even longer explanation:
An application may choose to use whatever line endings it likes regardless of the default OS line ending style. If I want my text editor to print a newline every time it encounters a period that is no harder than using the \n to represent a newline because I'm interpreting the text as I display it anyway. IOW, I'm fiddling around with measuring the width of each character so it knows where to display the next so it is very simple to add a statement saying that if the current char is a period then perform a newline action (or if it is a \n then display a period).
Aside from the null terminator, no character code is sacred and when you write a text editor or viewer you are in charge of translating the bits in your file into glyphs (or carriage returns) on the screen. The only thing that distinguishes a control character such as the newline from other characters is that most font sets don't include them (meaning they don't have a visual representation available).
That being said, if you are working at a higher level of abstraction then you probably aren't making your own textbox controls. If this is the case then you're stuck with whatever line ending that control makes available to you. Even in this case it is a simple (and fairly quick) matter to automatically detect the line ending style of any string and adjust accordingly.
If you are programming in PHP, it is useful to split lines by \n and then trim() each line (provided you don't care about whitespace) to give you a "clean" line regardless.
foreach($line in explode("\n", $data))
{
$line = trim($line);
...
}
For php, \n should work for you!
http://php.net/manual/en/language.types.string.php
We implemented the online service where it is possible to generate PDF with predefined
structure. The user can choose a LaTeX template and then compile it with an appropriate inputs.
The question we worry about is the security, that the malicious user was not able to gain shell access through the injection of special instruction into latex document.
We need some workaround for this or at least a list of special characters that we should strip from the input data.
Preferred language would be PHP, but any suggestions, constructions and links are very welcomed.
PS. in few word we're looking for mysql_real_escape_string for LaTeX
Here's some code to implement the Geoff Reedy answer. I place this code in the public domain.
<?
$test = "Test characters: # $ % & ~ _ ^ \ { }.";
header( "content-type:text/plain" );
print latexSpecialChars( $test );
exit;
function latexSpecialChars( $string )
{
$map = array(
"#"=>"\\#",
"$"=>"\\$",
"%"=>"\\%",
"&"=>"\\&",
"~"=>"\\~{}",
"_"=>"\\_",
"^"=>"\\^{}",
"\\"=>"\\textbackslash",
"{"=>"\\{",
"}"=>"\\}",
);
return preg_replace( "/([\^\%~\\\\#\$%&_\{\}])/e", "\$map['$1']", $string );
}
The only possibility (AFAIK) to perform harmful operations using LaTeX is to enable the possibility to call external commands using \write18. This only works if you run LaTeX with the --shell-escape or --enable-write18 argument (depending on your distribution).
So as long as you do not run it with one of these arguments you should be safe without the need to filter out any parts.
Besides that, one is still able to write other files using the \newwrite, \openout and \write commands. Having the user create and (over)write files might be unwanted? So you could filter out occurrences of these commands. But keeping blacklists of certain commands is prone to fail since someone with a bad intention can easily hide the actual command by obfusticating the input document.
Edit: Running the LaTeX command using a limited account (ie no writing to non latex/project related directories) in combination with disabling \write18 might be easier and more secure than keeping a blacklist of 'dangerous' commands.
According to http://www.tug.org/tutorials/latex2e/Special_Characters.html the special characters in latex are # $ % & ~ _ ^ \ { }. Most can be escaped with a simple backslash but _ ^ and \ need special treatment.
For caret use \^{} (or \textasciicircum), for tilde use \~{} (or \textasciitilde) and for backslash use \textbackslash
If you want the user input to appear as typewriter text, there is also the \verb command which can be used like \verb+asdf$$&\~^+, the + can be any character but can't be in the text.
In general, achieving security purely through escaping command sequences is hard to do without drastically reducing expressivity, since it there is no principled way to distinguish safe cs's from unsafe ones: Tex is just not a clean enough programming language to allow this. I'd say abandon this approach in favour of eliminating the existence of security holes.
Veger's summary of the security holes in Latex conforms with mine: i.e., the issues are shell escapes and file creation.overwriting, though he has missed a shell escape vulnerability. Some additional points follow, then some recommendations:
It is not enough to avoid actively invoking --shell-escape, since it can be implicitly enabled in texmf.cnf. You should explicitly pass --no-shell-escape to override texmf.cnf;
\write18 is a primitive of Etex, not Knuth's Tex. So you can avoid Latexes that implement it (which, unfortunately, is most of them);
If you are using Dvips, there is another risk: \special commands can create .dvi files that ask dvips to execute shell commands. So you should, if you use dvips, pass the -R2 command to forbid invoking of shell commands;
texmf.cnf allows you to specify where Tex can create files;
You might not be able to avoid disabling creation of fonts if you want your clients much freedom in which fonts they may create. Take a look at the notes on security for Kpathsea; the default behaviour seems reasonable to me, but you could have a per user font tree, to prevent one user stepping on another users toes.
Options:
Sandbox your client's Latex invocations, and allow them freedom to misbehave in the sandbox;
Trust in kpathsea's defaults, and forbid shell escapes in latex and any other executables used to build the PDF output;
Drastically reduce expressivity, forbidding your clients the ability to create font files or any new client-specified files. Run latex as a process that can only write to certain already existing files;
You can create a format file in which the \write18 cs, and the file creation css, are not bound, and only macros that invoke them safely, such as for font/toc/bbl creation, exist. This means you have to decide what functionality your clients have: they would not be able to freely choose which packages they import, but must make use of the choices you have imposed on them. Depending on what kind of 'templates' you have in mind, this could be a good option, allowing use of packages that use shell escapes, but you will need to audit the Tex/Latex code that goes into your format file.
Postscript
There's a TUGBoat article, Server side PDF generation based on LATEX templates, addressing another take on the question to the one I have taken, namely generating PDFs from form input using Latex.
You'd probably want to make sure that your \write18 is disabled.
See http://www.fceia.unr.edu.ar/lcc/cdrom/Instalaciones/LaTex/MiKTex/doc/ch04s08.html and http://www.texdev.net/2009/10/06/what-does-write18-mean/