CLI multibyte input

CLI multibyte input - php

I have a problem receiving multibyte character through PHP CLI.
I have a little script that reads from STDIN:
<?php
while(true) {
# also not working, but is similar to fgets anyway, except you can specify the ending char, so no surprise
#$strInput = trim(stream_get_line(STDIN, 1024, PHP_EOL));
$strInput = trim(fgets(STDIN, 1024));
var_dump($strInput);
}
Basically this works, but I have a problem if I type in any non ASCI character,
then using backspace.
e.g. input = 'ß' // string(2) "ß"
e.g. input = 'ß' and backspace // string(1) "�"
e.g. input = 'ß' and backspace twice // string(0) ""
I don't know whether this a PHP or terminal problem. I tend to think it is a PHP problem.
It works on my terminal:
e.g. % ß // ß: Command not found
e.g. % ßß and backspace // ß: Command not found
Unfortunatelly there is no mb_* function to read from STDIN.
Here are my terminal settings:
% locale
LANG=de_DE.UTF-8
LC_CTYPE="de_DE.UTF-8"
LC_COLLATE=C
LC_TIME="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_ALL=
% stty -a
speed 38400 baud; 58 rows; 212 columns;
lflags: icanon isig iexten echo echoe echok echoke -echonl echoctl
-echoprt -altwerase -noflsh -tostop -flusho -pendin -nokerninfo
-extproc
iflags: -istrip icrnl -inlcr -igncr ixon -ixoff ixany imaxbel -ignbrk
brkint -inpck -ignpar -parmrk
oflags: opost onlcr -ocrnl tab0 -onocr -onlret
cflags: cread cs8 -parenb -parodd hupcl -clocal -cstopb -crtscts -dsrflow
-dtrflow -mdmbuf
cchars: discard = ^O; dsusp = ^Y; eof = ^D; eol = <undef>;
eol2 = <undef>; erase = ^H; erase2 = ^H; intr = ^C; kill = ^U;
lnext = ^V; min = 1; quit = ^\; reprint = ^R; start = ^Q;
status = ^T; stop = ^S; susp = ^Z; time = 0; werase = ^W;
% cat /home/foobar/.login_conf
# $FreeBSD: releng/10.2/share/skel/dot.login_conf 77995 2001-06-10 17:08:53Z ache $
#
# see login.conf(5)
#
me:\
:charset=UTF-8:\
:lang=de_DE.UTF-8:\
:setenv=LC_COLLATE=C:
I am using FreeBSD 10.3 with the latest PHP 7.1.9.
I also tried different shells like bash, but the output remains the same.
Also tried starting xterm with -u8 option, no success.
Does anybody have an idea how to fix this? What am I missing?

Related

Different Output Characters on VBScript and PHP in Windows Command Prompt [duplicate]

I'm trying to generate mail configurations and personalized signatures through a batch file that reads a list of users, a template, and creates a personalized output. That's done and works:
#ECHO OFF
SETLOCAL ENABLEEXTENSIONS
GOTO begin
:writesignature
cscript //NoLogo replacetext.vbs "[NAME]" %1 signature.html stdout | cscript //NoLogo replacetext.vbs "[JOB]" %3 stdin stdout | cscript //NoLogo replacetext.vbs "[EMAIL]" %2 stdin signature-%4.html
GOTO :end
:begin
FOR /F "tokens=1,2,3,4 delims=;" %%A IN ('TYPE people.lst') DO CALL :writesignature "%%A" "%%B" "%%C" %%D
:end
To do the text replacing, I created replacetext.vbs, that allows me to replace a string for oter, and can be piped if stdin and stdout are indicated as the source and target files:
CONST ForReading = 1
CONST ForWritting = 2
CONST ForAppending = 8
CONST OpenAsASCII = false
CONST OpenAsUnicode = true
CONST OpenAsDefault = -2
Const OverwriteIfExist = true
Const FailIfExist = false
Const CreateIfNotExist = true
Const FailIfNotExist = false
SET objFSO = CreateObject("Scripting.FileSystemObject")
SET objFILEINPUT = Wscript.StdIn
SET objFILEOUTPUT = Wscript.StdOut
IF (Wscript.Arguments.Count < 2) OR (Wscript.Arguments.Count > 4) THEN
Wscript.Echo "Not enought arguments"
Wscript.Echo "replacetext ""<original>"" ""<replacement>"" "
Wscript.Quit(1 MOD 255)
END IF
IF Wscript.Arguments.Count > 2 THEN
IF Wscript.Arguments(2) = "stdin" THEN
' Wscript.Echo "Input: StdIn"
ELSE
' Wscript.Echo "Input: " + Wscript.Arguments(2)
SET objFILEINPUT = objFSO.OpenTextFile(Wscript.Arguments(2), ForReading, OpenAsASCII)
END IF
IF Wscript.Arguments.Count = 4 THEN
IF Wscript.Arguments(3) = "stdout" THEN
' Wscript.Echo "Output: StdOut"
ELSE
' Wscript.Echo "Output: " + Wscript.Arguments(3)
IF objFSO.FileExists(Wscript.Arguments(3)) THEN
SET objFILEOUTPUT = objFSO.OpenTextFile(Wscript.Arguments(3), ForWritting, CreateIfNotExist, OpenAsASCII)
ELSE
SET objFILEOUTPUT = objFSO.CreateTextFile(Wscript.Arguments(3), OverwriteIfExist, OpenAsASCII)
END IF
END IF
END IF
END IF
strText = objFILEINPUT.ReadAll()
strNewText = Replace(strText, Wscript.Arguments(0), Wscript.Arguments(1))
objFILEOUTPUT.Write(strNewText)
objFILEOUTPUT.Close
objFILEINPUT.Close
Wscript.Quit(0 MOD 255)
The problem is that when I put non-ASCII characters in ANSI/Windows-1250 in the people.lst (Comunicación), while it works and reads them in console, showing them (not converting them) as OEM characters (Comunicaci¾n) when I write the output files, somehow it does convert them transparently, so the output file in Windows shows Comunicaci¾n instead of Comunicación.
After much debugging, I've localized the problem in ONLY the arguments (no automatic conversion on the template file).
How can I disable said transparent conversion, or convert back the input from ANSI to OEM so the conversion works as intended?

The problem is that the cmd.exe works with different code page than cscript.exe/wscript.exe. I have similiar problem in Poland, where cmd.exe works with codepage 852 (I believe this is for compatibility with older MS-DOS programs) and wscript.exe works in Windows' native codepage 1250.
To solve the problem, put the following line on the beginning of the batch file:
mode con cp select=1250

Encoding puzzles with sockets in different languages

I have this below code written in PHP responsible for the server socket, specifically by writing messages to certain sockets:
header('Content-Type: text/html; charset=utf-8');
const PAYLOAD_LENGTH_16 = 126;
const PAYLOAD_LENGTH_63 = 127;
const OPCODE_CONTINUATION = 0;
for ($i = 0; $i < $frameCount; $i++) {
// fetch fin, opcode and buffer length for frame
$fin = $i != $maxFrame ? 0 : self::FIN;
$opcode = $i != 0 ? self::OPCODE_CONTINUATION : $opcode;
$bufferLength = $i != $maxFrame ? $bufferSize : $lastFrameBufferLength;
// set payload length variables for frame
if ($bufferLength <= 125) {
$payloadLength = $bufferLength;
$payloadLengthExtended = '';
$payloadLengthExtendedLength = 0;
}
elseif($bufferLength <= 65535) {
$payloadLength = self::PAYLOAD_LENGTH_16;
$payloadLengthExtended = pack('n', $bufferLength);
$payloadLengthExtendedLength = 2;
} else {
$payloadLength = self::PAYLOAD_LENGTH_63;
$payloadLengthExtended = pack('xxxxN', $bufferLength); // pack 32 bit int, should really be 64 bit int
$payloadLengthExtendedLength = 8;
}
// set frame bytes
$buffer = pack('n', (($fin | $opcode) << 8) | $payloadLength).$payloadLengthExtended.substr($message, $i * $bufferSize, $bufferLength);
And below I have the code in Objective-C responsible for receiving these messages from the socket server:
NSInteger len = 0;
uint8_t buffer[4096];
while ([inputStream hasBytesAvailable]) {
len = [inputStream read:buffer maxLength:sizeof(buffer)];
if (len > 0) {
[self.data appendBytes:buffer length:len];
[self.log insertText:[NSString stringWithFormat:#"Log: Received a message from server:\n\n"]];
NSLog(#"Received a message from server...");
}
}
when all bytes are received I run the following command to turn the data into a file:
[self.data writeToFile:#"dataComes.txt" options:NSDataWritingAtomic error:nil]
The Problem
We will send a large file in JSON format for objective-c, with that he will receive that information and will generate a file called dataComes.txt, I can see the JSON file normally but except for some strange characters such as:
~ or ~Â or â-Û
These strange characters always shows at the beginning of each block messages that Objective-C receives (Yes, the socket server and TCP divide large messages into blocks of messages).
What is the cause of this problem and how it could solve this?

SOLUTION 1: Filtering
I can filter out unwanted characters that may come, but it will also filter out some words that have accentuation:
NSCharacterSet *notAllowedChars = [[NSCharacterSet characterSetWithCharactersInString:#"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ[]{}:,'"] invertedSet];
NSString *resultString = [[total componentsSeparatedByCharactersInSet:notAllowedChars] componentsJoinedByString:#" "];
SOLUTION 2: Stop using sockets
I have tried many ways to send data to my app, the only one that worked was to send the data separately (a loop of one JSON), but to works I had to put my code (PHP) to sleep using sleep(1) (and I believe this is not good) because if not Objective-C recognizes that this data is a single package.
In this case, or my code have problems, or the programming of socket in objective-c was not very well done and has inconsistencies (bug). What remains for me to do with my connections through normal requests via web server (which I do not think it's a good idea, since I have to do this every 3 seconds in a 5 minute time interval).
SOLUTION 3: FILTERING + UNICODE
On the server side I can filter all special characters and create a specific combination for it example:
Hello é world to Hello /e001/ world
And in my app I can filter this combination and change to the real format....

does Every single call to mysql_real_escape_string require another trip to the database?

http://php.net/manual/en/function.mysql-real-escape-string.php:
mysql_real_escape_string() calls MySQL's library function
mysql_real_escape_string, which prepends backslashes to the following
characters: \x00, \n, \r, \, ', " and \x1a.
Ok, so basically if i ever do something like this:
mysql_query("insert T(C)select'".mysql_real_escape_string($value)."'")
I'm making 1 trip to the database for the mysql_real_escape_string function and another trip for the function mysql_query = 2 trips to the database?

The fact that it uses the mysql library does not mean it does a round trip with the server.
It runs code from the mysql client library, loaded in the same process as your php interpreter. You do need a connection though - that function needs to know some server settings to operate properly. But those settings are cached in the connection information on the PHP side.
If you want to verify this (and you're on linux), write a simple script like:
<?php
$link = mysql_connect('localhost', 'user', 'pass');
echo "Connection done\n";
echo mysql_real_escape_string("this ' is a test");
?>
And run it through strace:
$ strace php t.php
.... # here comes the connection to mysql, socket fd == 3
connect(3, {sa_family=AF_FILE, path="/var/run/mysqld/mysqld.sock"}, 110) = 0
fcntl(3, F_SETFL, O_RDWR) = 0
setsockopt(3, SOL_SOCKET, SO_RCVTIMEO, "\2003\341\1\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0
.... # talking with mysql here
poll([{fd=3, events=POLLIN}], 1, 60000) = 1 ([{fd=3, revents=POLLIN}])
read(3, "8\0\0\0\n5.1.58-log\0\3\0\0\0K-?4'fL+\0\377\367!"..., 16384) = 60
...
read(3, "\7\0\0\2\0\0\0\2\0\0\0", 16384) = 11
# first php echo
write(1, "Connection done\n", 16Connection done ) = 16
# second php echo
write(1, "this \\' is a test", 17this \' is a test) = 17
munmap(0x7f62e187a000, 528384) = 0
....
The only important thing there is that the two writes caused by the echo statements have no other syscall in between - no network communication is possible without a syscall (from userspace in linux anyway).

Clear PHP CLI output

I'm trying to get a "live" progress indicator working on my php CLI app. Rather than outputting as
1Done
2Done
3Done
I would rather it cleared and just showed the latest result. system("command \C CLS") doesnt work. Nor does ob_flush(), flush() or anything else that I've found.
I'm running windows 7 64 bit ultimate, I noticed the command line outputs in real time, which was unexpected. Everyone warned me that out wouldn't... but it does... a 64 bit perk?
Cheers for the help!
I want to avoid echoing 24 new lines if I can.

Try outputting a line of text and terminating it with "\r" instead of "\n".
The "\n" character is a line-feed which goes to the next line, but "\r" is just a return that sends the cursor back to position 0 on the same line.
So you can:
echo "1Done\r";
echo "2Done\r";
echo "3Done\r";
etc.
Make sure to output some spaces before the "\r" to clear the previous contents of the line.
[Edit] Optional: Interested in some history & background? Wikipedia has good articles on "\n" (line feed) and "\r" (carriage return)

I came across this while searching for a multi line solution to this problem. This is what I eventually came up with. You can use Ansi Escape commands. http://www.inwap.com/pdp10/ansicode.txt
<?php
function replaceOut($str)
{
$numNewLines = substr_count($str, "\n");
echo chr(27) . "[0G"; // Set cursor to first column
echo $str;
echo chr(27) . "[" . $numNewLines ."A"; // Set cursor up x lines
}
while (true) {
replaceOut("First Ln\nTime: " . time() . "\nThird Ln");
sleep(1);
}
?>

I recently wrote a function that will also keep track of the number of lines it last output, so you can feed it arbitrary string lengths, with newlines, and it will replace the last output with the current one.
With an array of strings:
$lines = array(
'This is a pretty short line',
'This line is slightly longer because it has more characters (i suck at lorem)',
'This line is really long, but I an not going to type, I am just going to hit the keyboard... LJK gkjg gyu g uyguyg G jk GJHG jh gljg ljgLJg lgJLG ljgjlgLK Gljgljgljg lgLKJgkglkg lHGL KgglhG jh',
"This line has newline characters\nAnd because of that\nWill span multiple lines without being too long",
"one\nmore\nwith\nnewlines",
'This line is really long, but I an not going to type, I am just going to hit the keyboard... LJK gkjg gyu g uyguyg G jk GJHG jh gljg ljgLJg lgJLG ljgjlgLK Gljgljgljg lgLKJgkglkg lHGL KgglhG jh',
"This line has newline characters\nAnd because of that\nWill span multiple lines without being too long",
'This is a pretty short line',
);
One can use the following function:
function replaceable_echo($message, $force_clear_lines = NULL) {
static $last_lines = 0;
if(!is_null($force_clear_lines)) {
$last_lines = $force_clear_lines;
}
$term_width = exec('tput cols', $toss, $status);
if($status) {
$term_width = 64; // Arbitrary fall-back term width.
}
$line_count = 0;
foreach(explode("\n", $message) as $line) {
$line_count += count(str_split($line, $term_width));
}
// Erasure MAGIC: Clear as many lines as the last output had.
for($i = 0; $i < $last_lines; $i++) {
// Return to the beginning of the line
echo "\r";
// Erase to the end of the line
echo "\033[K";
// Move cursor Up a line
echo "\033[1A";
// Return to the beginning of the line
echo "\r";
// Erase to the end of the line
echo "\033[K";
// Return to the beginning of the line
echo "\r";
// Can be consolodated into
// echo "\r\033[K\033[1A\r\033[K\r";
}
$last_lines = $line_count;
echo $message."\n";
}
In a loop:
foreach($lines as $line) {
replaceable_echo($line);
sleep(1);
}
And all lines replace each other.
The name of the function could use some work, just whipped it up, but the idea is sound. Feed it an (int) as the second param and it will replace that many lines above instead. This would be useful if you were printing after other output, and you didn't want to replace the wrong number of lines (or any, give it 0).
Dunno, seemed like a good solution to me.
I make sure to echo the ending newline so that it allows the user to still use echo/print_r without killing the line (use the override to not delete such outputs), and the command prompt will come back in the correct place.

i know the question isn't strictly about how to clear a SINGLE LINE in PHP, but this is the top google result for "clear line cli php", so here is how to clear a single line:
function clearLine()
{
echo "\033[2K\r";
}

function clearTerminal () {
DIRECTORY_SEPARATOR === '\\' ? popen('cls', 'w') : exec('clear');
}
Tested on Win 7 PHP 7. Solution for Linux should work, according to other users reports.

something like this :
for ($i = 0; $i <= 100; $i++) {
echo "Loading... {$i}%\r";
usleep(10000);
}

Use this command for clear cli:
echo chr(27).chr(91).'H'.chr(27).chr(91).'J'; //^[H^[J

Console functions are platform dependent and as such PHP has no built-in functions to deal with this. system and other similar functions won't work in this case because PHP captures the output of these programs and prints/returns them. What PHP prints goes to standard output and not directly to the console, so "printing" the output of cls won't work.

<?php
error_reporting(E_ERROR | E_WARNING | E_PARSE);
function bufferout($newline, $buffer=null){
$count = strlen(rtrim($buffer));
$buffer = $newline;
if(($whilespace = $count-strlen($buffer))>=1){
$buffer .= str_repeat(" ", $whilespace);
}
return $buffer."\r";
};
$start = "abcdefghijklmnopqrstuvwxyz0123456789";
$i = strlen($start);
while ($i >= 0){
$new = substr($start, 0, $i);
if($old){
echo $old = bufferout($new, $old);
}else{
echo $old = bufferout($new);
}
sleep(1);
$i--;
}
?>
A simple implementation of #dkamins answer. It works well. It's a bit- hack-ish. But does the job. Wont work across multiple lines.

function (int $count = 1) {
foreach (range(1,$count) as $value){
echo "\r\x1b[K"; // remove this line
echo "\033[1A\033[K"; // cursor back
}
}
See the full example here

Unfortunately, PHP 8.0.2 does not has a function to do it. However, if you just want to clear console try this: print("\033[2J\033[;H"); or use : proc_open('cls', 'w');
It works in php 8.0.2 and windows 10. It is the same that system('cls') using c language programing.

Tried some of solutions from answers:
<?php
...
$messages = [
'11111',
'2222',
'333',
'44',
'5',
];
$endlines = [
"\r",
"\033[2K\r",
"\r\033[K\033[1A\r\033[K\r",
chr(27).chr(91).'H'.chr(27).chr(91).'J',
];
foreach ($endlines as $i=>$end) {
foreach ($messages as $msg) {
output()->write("$i. ");
output()->write($msg);
sleep(1);
output()->write($end);
}
}
And \033[2K\r seems like works correct.

Need help with "pack" for perl and php

I've the task to convert a crypt function someone made in perl into php code. Everything works okay except this:
Perl:
$wert = Encode::encode( "utf8", $wert );
$len=length $wert;
$pad = ($len % 16)?"0".chr(16 - ($len % 16)):"10";
$fuell = pack( "H*", $pad x (16 - $len % 16));
PHP:
$wert = utf8_encode($wert);
$len = mb_strlen($wert);
$pad = ( $len%16 ) ? '0'.chr(16 - ($len%16)) : '10';
$fuell = pack("H*", str_repeat($pad, (16 - $len % 16)));
The php version works okay for some strings. But when I have something like '2010-01-01T00:00:00.000' the perl version works without any error and the php version prints "PHP Warning: pack(): Type H: illegal hex digit".
I'm very grateful if someone can spot the error in the php version.
Edit:
This is the complete function I've to convert into php. It was made by a programmer of a company which doesn't work for us anymore so I can't really tell what the original intention was.
sub crypt
{
my $self = shift;
my ($wert,$pw)= #_;
$wert = Encode::encode( "utf8", $wert );
$pw = Encode::encode( "utf8", $pw );
$len=length $wert;
$pad = ($len % 16)?"0".chr(16 - ($len % 16)):"10";
$fuell = pack( "H*", $pad x (16 - $len % 16));
$wert=$wert.$fuell;
$lenpw=length $pw;
$fuell = ($lenpw % 16)? pack ("H*", "00" x (16 - $lenpw % 16)):"";
$pw=$pw.$fuell;
$cipher = new Crypt::Rijndael $pw, Crypt::Rijndael::MODE_CBC;
$cipher->set_iv($pw);
$crypted = encode_base64($cipher->encrypt($wert),"");
return $crypted;
}

It looks like the error is actually in both versions. The format code H looks for a hex digit, and as noted in the PHP error, it isn't finding (a legal) one. The culprit appears to be this expression:
chr(16 - ($len % 16))
The Perl version isn't complaining because Perl's version of pack will convert the character regardless of whether or not it is a hex digit (which may not be what you want). The documentation goes into more detail about what actually happens.
To prevent the error, try this instead:
sprintf('%x', 16 - ($len % 16))
Note: While this should fix the error you are getting, I don't know if it's an acceptable solution because I don't know the exact intent of the original author of the Perl code.

It seems that the Perl implementation of pack() is tolerant of invalid hex digits in the input string, and the PHP version is decidedly not.
Consider:
print pack("H*", "ZZ");
This prints 3 in Perl (for some reason), but results in the error you mentioned in PHP.
I'm not sure exactly what Perl is doing with these 'digits', but it's definitely not the same as PHP.
EDIT: It looks like, Perl actually will "roll" the hex digit domain forward into the character set. That is:
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ #-- Give this to Perl...
0123456789ABCDEF0123456789ABCDEF0123 #-- .. and it's treated as this hex digit
Thus, "ZZ" is the same as "33", which is why it prints 3. Note that this behavior is not well-defined according to the documentation. Thus the original implementation in Perl can be considered buggy, since it relies on behavior that isn't well-defined.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

CLI multibyte input - php

Related

Different Output Characters on VBScript and PHP in Windows Command Prompt [duplicate]

Encoding puzzles with sockets in different languages

does Every single call to mysql_real_escape_string require another trip to the database?

Clear PHP CLI output

Need help with "pack" for perl and php

Categories

Resources