I wrote code that sends a query from php to a python script(some variable) and the python script does some things and returns some str. But this string is not valid! It has many question marks. I know that this problem appears when the encoding is not valid, but all files I'm using have the format utf-8.
My php file:
$search = $_POST["search"];
$search = shell_exec('get.py ' . $search);
print($search);
And as result I see this ������������. Python script code:
import sys
import pymorphy2
morph = pymorphy2.MorphAnalyzer()
word = sys.argv[1]
word = morph.parse(word)[0]
i = 0
result = ""
while i < len(word):
result = result + " " + word.make_agree_with_number(i).word
i = i + 1
print (result)
Some interesting things: all (2) files are in utf-8. In the python script I use coding utf-8 and it is not help. In php I use the iconv() function to change encoding, and use the mb_detect_encoding function to detect what the encoding is now. It shows utf-8. Please help me to make my encoding valid!
My version of python is 3.5, and I use .htaccess.
You should not be having this problem, and can narrow down what the problem is by using a minimal example:
In call_python.php:
<?php
$result = shell_exec('python called_python.py');
print("Python reports: ".$result);
In called_python.py:
#!/usr/bin/env python
print("I am a banana!")
We can see the result:
$ php call_python.php
Python reports: I am a banana!
I'm using a bash shell on a Mac system. If the above works for you, you know the problem is what you're doing inside the code. If not, you know that it has to do with the transport between the two scripts.
Related
Tl;dr: Consider using python framework Bottle instead of PHP. If you can't, I have python2-specific solution link at the bottom of this post.
I made simple markov chain python program. It operates on two wordlists, greek.txt and japanese.txt. Note that japanese.txt contains the Ō character, but the problem persisted even when I replaced the characters with regular O.
I want to make really simple, small and dirty PHP API.
When I call PHP file, it should echo output of command python3 markov-chain.py $_GET['file'] 10 $_GET['min'] $_GET['max'].
I have this:
<?php
if(isset($_GET["file"]) && isset($_GET["min"]) && isset($_GET["max"])) {
if($_GET["file"] != null && $_GET["min"] != null && $_GET["max"] != null && is_numeric($_GET["min"]) && is_numeric($_GET["max"])) {
# Yeah, it should be fine. Just check for $file in whitelist
$filename = $_GET["file"];
$min_len = $_GET["min"];
$max_len = $_GET["max"];
$count = 10;
if( $filename === "japanese.txt" || $filename == "greek.txt" ) {
$command = 'python3 markov-chain.py ' . $filename . ' ' . $count . ' ' . $min_len . ' ' . $max_len;
echo($command);
echo(system($command, $statuscode));
echo('S:'.$statuscode);
echo('E:'.error_get_last());
}
}
}
?>
When I call this on the greek.txt wordlist, everything works as expected. However, when I call it on japanese.txt, status code is 1 and nothing is sent to output of system(). When I copy&paste the generated command into terminal, it works without issues even for the japanese.txt variant.
I tried to make test.sh file that would have hardcoded arguments:
#!/bin/sh
python3 markov-chain.py greek.txt 10 3 6 2>> errorlog.txt
When I call this file from PHP, it works. However when I change the test.sh file to work with japanese.txt wordlist, I still get nothing as an output and 1 as status code. Invoking the test.sh from terminal works perfectly for both variants (greek and japanese). Nothing is written to error log file (the file isn't even created!) when I call the test.sh from PHP.
Greek wordlist
Japanese wordlist
Edit:
Greek wordlist is ASCII, and Japanese wordlist is UTF-8, so PHP doesn't accept (or Python doesn't correctly send) UTF-8 encoding.
Edit:
Not really a solution, but I ended up switching to Python framework Bottle, it's exactly why I used PHP: I just write it in one file and it... works. Great for simple APIs I just need to quickly hack together.
Concerning the issue with UTF-8, I found this, but was not able to solve my problem anyway. Hopefully it helps someone.
I'm using PHP 7.2.11 on my laptop that runs on Windows 10 Home Single Language 64-bit operating system.
I've installed Apache/2.4.35 (Win32) and PHP 7.2.10 using the latest version of XAMPP.
I typed in a below code into a file titled demo.php :
<?php
$string1 = "Hel\xE1lo"; //Tried hexadecimal equivalent code-point from ISO-8859-1
echo $string1;
?>
After running above program into my web browser it gave me below output :
Hel�lo
Then, I made a small change to the above program and re-wrote the code as below :
<?php
$string1 = "Hel\xC3\xA1lo"; //Tried hexadecimal equivalent code-point from UTF-8, C form
echo $string1;
?>
After running the same program after making some change into my web browser it gave me below output (Indeed the expected result) :
Helálo
So, a doubt came to my mind after watching this stuff.
I want to know whether there is any built-in function or some mechanism in PHP which will tell me which character-encoding standard has been used in the current file?
P.S. : I know that in PHP the string will be encoded in whatever fashion it is encoded in the script file. I want to know whether there exist some built-in function, some mechanism or any other way around which will tell me the character-encoding standard used in the file under consideration.
This function must be in the same file whose encoding is to be determined.
//return 'UTF-8', 'iso-8859-1',.. or false
function getPageCoding(){
$codes = array(
'UTF-8' => "\xc3\xa4",
'iso-8859-1' => "\xe4",
'cp850' => "\x84",
);
return array_search('ä',$codes);
}
echo getPageCoding();
Demo: https://3v4l.org/UVvBM
I need to decode a base64 token for an authentication string, and I found some working examples in Python, Perl and PHP, and I wrote the equivalent code in Node, but I ran into an issue. It seems the base64 decoder for Node doesn't work the same way as for the other 3 languages.
Running this in Python
token = 'BaSe64sTRiNghERe'
decoded_token = token.decode('base64')
print decoded_token
returns this string
???F#`?D^
Running this in Perl
my $token = 'BaSe64sTRiNghERe';
my $decoded_token = decode_base64($token);
print $decoded_token;
returns this string
???F#`?D^
Running this in PHP
$token = 'BaSe64sTRiNghERe';
$decoded_token = base64_decode($token, true);
echo $decoded_token;
returns this string
???F#`?D^
and finally, running this in a Node script
var token = 'BaSe64sTRiNghERe',
decoded_token = Buffer.from(token, 'base64').toString();
console.log(decoded_token);
returns this string
????F#`?D^
The question is, why the extra question mark in the decoded string? And how can I get the same result in Node as I get in Perl, Python and PHP?
UPDATE
running this in the command line
echo BaSe64sTRiNghERe | base64 --decode
gives me the same output as the perl, python and php scripts
but running the same command from node
var exec = require('child_process').exec;
exec('echo BaSe64sTRiNghERe | base64 --decode', function callback(error, stdout, stderr){
console.log(stdout);
});
I still get the wrong stuff.
The output is different since you have generated unprintable characters, and node seems to handle those unprintable characters differently from the other languages. You are also losing information:
>>> token = 'BaSe64sTRiNghERe'
>>> decoded_token = token.decode('base64')
>>> print decoded_token
???F#`?D^
>>> decoded_token[0] == decoded_token[1]
False
If you modify your python snippet to look like this:
import binascii
token = 'BaSe64sTRiNghERe'
decoded_token = binascii.hexlify(token.decode('base64'))
print(decoded_token)
Then modify your nodejs snippet to look like this:
var token = 'BaSe64sTRiNghERe',
decoded_token = Buffer.from(token, 'base64').toString('hex');
console.log(decoded_token);
You will avoid the differences in how they handle unprintable characters and see that the base64 decodes have the same byte values.
I need to address UTF-8 filenames with the php exec command. The problem is that the php exec command does not seem to understand utf-8. I use something like this:
echo exec('locale charmap');
returns ANSI_X3.4-1968
looking at this SO question, the solution lookes like that:
echo exec('LANG=de_DE.utf8; locale charmap');
But I still get the same output: ANSI_X3.4-1968
On the other hand - if I execute this php command on the bash command line:
php -r "echo exec('LANG=de_DE.UTF8 locale charmap');"
The output is UTF-8.
So the questions are:
Why is there an different result be executing the php command at bash and at apache_module/web page?
How to set UTF-8 for exec if it runs inside a website as apache module?
To answer my own question - i found the following solution:
setting the locale environment variable with PHP
$locale='de_DE.UTF-8';
setlocale(LC_ALL,$locale);
putenv('LC_ALL='.$locale);
echo exec('locale charmap');
This sets to / returns UTF-8. So i'm able to pass special characters and umlauts to linux shell commands.
This solves it for me (source: this comment here):
<?php
putenv('LANG=en_US.UTF-8');
$command = escapeshellcmd('python3 myscript.py');
$output = shell_exec($command);
echo $output;
?>
I had the similar problem. My program was returning me some German letters like: üäöß. Here is my code:
$programResult = shell_exec('my script');
Variable $programResult is containing German umlauts, but they were badly encoded. In order to encode it properly you can call utf8_encode() function.
$programResult = shell_exec('my script');
$programResult = utf8_encode($programResult);
I have a Python file I'm calling with PHP's exec function. Python then outputs a string (apparently Unicode, based on using isinstance), which is echoed by PHP. The problem I'm running into is that if my string has any special characters in it (like the degree symbol), it won't output. I'm sure I need to do something to fiddle with the encoding, but I'm not really sure what to do, and why.
EDIT: To get an idea of how I am calling exec, please see the following code snippet:
$tables = shell_exec('/s/python-2.6.2/bin/python2.6 getWikitables.py '.$title);
Python properly outputs the string when I call getWikitables.py by itself.
EDIT: It definitely seems to be something either on the Python end, or in transmitting the results. When I run strlen on the returned values in PHP, I get 0. Can exec only accept a certain type of encoding?
Try setting the LANG environment variable immediately before executing the Python script per http://php.net/shell-exec#85095:
shell_exec(sprintf(
'LANG=en_US.utf-8; /s/python-2.6.2/bin/python2.6 getWikitables.py %s',
escapeshellarg($title)
));
(use of sprintf() to (hopefully) make it a little easier to follow the lengthy string)
You might also/instead need to do this before calling shell_exec(), per http://php.net/shell-exec#78279:
$locale = 'en_US.utf-8';
setlocale(LC_ALL, $locale);
putenv('LC_ALL='.$locale);
I have had a similar issue and solved it with the following. I don't understand why it is necessary, since I though all is already processed with UTF-8. Calling my Python script on the command line worked, but not with exec (shell_exec) via PHP and Apache.
According to a php forum entry this one is needed when you want to use escapeshellarg():
setlocale(LC_CTYPE, "en_US.UTF-8");
It needs to be called before escapeshellarg() is executed. Also, it was necessary to set a certain Python environment variable before the exec command (found an unrelated hint here):
putenv("PYTHONIOENCODING=utf-8");
My Python script evaluated the arguments like this:
sys.argv[1].decode("utf-8")
(Hint: That was required because I use a library to convert some arabic texts.)
So finally, I could imagine that the original question could be solved this way:
setlocale(LC_CTYPE, "en_US.UTF-8");
putenv("PYTHONIOENCODING=utf-8");
$tables = shell_exec('/s/python-2.6.2/bin/python2.6 getWikitables.py ' .
escapeshellarg($title));
But I cannot tell anything regarding the return value. In my case I could output it to the browser directly without any problems.
Spent many, many hours to find that out... One of the situations when I hate my job ;-)
This worked for me
setlocale(LC_CTYPE, "en_US.UTF-8");
putenv("PYTHONIOENCODING=utf-8");
$tables = shell_exec('/s/python-2.6.2/bin/python2.6 getWikitables.py ' .
escapeshellarg($title));
On php you can use methods like utf8_encode() or utf8_decode() to solve your problem.