File reading from PHP using python script

File reading from PHP using python script - php

Okay, this is driving me crazy. I have a small file. Here is the dropbox link https://www.dropbox.com/s/74nde57f07jj0zj/transcript.txt?dl=0.
If I try to read the content of the file using python f.read(), I can easily read it. But, if I try to run the same python program using php shell_exec(), the file read fails. This is the error I get.
Traceback (most recent call last):
File "/var/www/python_code.py", line 2, in <module>
transcript = f.read()
File "/opt/anaconda/lib/python3.4/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 107: ordinal not in range(128)
I have checked all the permission issues and there is no problem with that.
Can anyone kindly shed some light?
Here is my python code.
f = open('./transcript/transcript.txt', 'r')
transcript = f.read()
print(transcript)
Here is my PHP code.
$output = shell_exec("/opt/anaconda/bin/python /var/www/python_code.py");
Thank you!
EDIT: I think the problem is in the file content. If I replace the content with simple 'I eat rice', then I can read the content from php. But the current content cannot be read. Still don't know why.

The problem appears is that your file contains non-ASCII characters, but you're trying to read it as ASCII text.
Either it is text, but is in some encoding or other that you haven't told us (probably UTF-8, Latin-1, or cp1252, but there are countless other possibilities), or it's not text at all, but rather arbitrary binary data.
When you open a text file without specifying an encoding, Python has to guess. When you're running from inside the terminal or whatever IDE you use, presumably, it's guessing the same encoding that you used in creating the file, and you're getting lucky. But when you're running from PHP, Python doesn't have as much information, so it's just guessing ASCII, which means it fails to read the file because the file has bytes that aren't valid as ASCII.
If you want to understand how Python guesses, see the docs for open, but briefly: it calls locale.getpreferredencoding(), which, at least on non-Windows platforms, reads it from the locale settings in the environment. On a typical linux system that's not new enough to be based on systemd but not too old, the user's shell will be set up for a UTF-8 locale, but services will be set up for C locale. If all of that makes sense to you, you may see a way to work around your problem. If it all sounds like gobbledegook, just ignore it.
If the file is meant to be text, then the right solution is to just pass the encoding to the open call. For example, if the file is UTF-8, do this:
f = open('./transcript/transcript.txt', 'r', encoding='utf-8')
Then Python doesn't have to guess.
If, on the other hand, the file is arbitrary binary data, then don't open it in text mode:
f = open('./transcript/transcript.txt', 'rb')
In this case, of course, you'll get bytes instead of str every time you read from it, and print is just going to print something ugly like b'aq\x9bz' that makes no sense; you'll have to figure out what you actually want to do with the bytes instead of printing them as a bytes.

Related

PHP obfuscator?

I have been using PHP Desktop which works great however if i want to share a project i did not want users to see everything in the code.
I tried the suggested code protectors but nothing seems to work for the current version. I found a simple PHP obfuscator code but it gives an error. it also generates some output but fails to echo a result.
The error:
Warning: php_strip_whitespace(): failed to open stream: No error in C:\xampp\htdocs\PHP Obfuscator\Obfus.php on line 11
The code:
<?php
//$infile = file_get_contents("Input.php");
$infile = '<?php echo "Hello World 123"; ?>';
$outfile = "Output.php";
echo "Processing $infile to $outfile\n";
$data="ob_end_clean();?>";
$data.=php_strip_whitespace($infile); // Remove whitespace
$data.=gzcompress($data,9); // Compress data
$data=base64_encode($data); // Encode in base64
// Generate output text
$out='<?ob_start();$a=\''.$data.'\';eval(gzuncompress(base64_decode($a)));$v=ob_get_contents();ob_end_clean();?>';
// Write output text
//file_put_contents($outfile,$out);
echo $out;
?>
Does anyone know how to fix this code to make reading the PHP harder for regular users that would download the exe?
It would be to prevent non coders only as it would be packed in a exe, i know it's not a secure method to hide code sources.
Also does anyone have blenc etc working with the current version? I had no luck even after following the tutorial.

The argument for php_strip_whitespace() has to be a file name, not a raw string. Write the data to a temporary file, then clean it, then delete the temporary file when you're done.
In any case, you're going about this all wrong. Security through obfuscation isn't really security at all. Any competent programmer will recognize the base64 encoding, and it's trivial to decompress the compressed data. Then, a decent IDE could restore the missing whitespace with a couple of keystrokes. Besides, your code, with its eval(gzuncompress(base64_decode($a))), literally tells the user what you did to obfuscate the code in the first place.
If you don't want users to access the source, don't distribute the source, period. Use an API or a compiler, not an obfuscator.

UTF-8, PHP, Win7 - Is there a solution now to save UTF-8-filenames on Win 7 using php?

Update: Just to not make you reading through all: PHP starting with
7.1.0alpha2 supports UTF-8 filenames on Windows. (Thanks to Anatol-Belski!)
Following some link chains on stackoverflow I found part of the answer:
https://stackoverflow.com/a/10138133/3716796 by Umberto Salsi
(and on the same question: https://stackoverflow.com/a/2950046/3716796 by Artefacto)
In short: 'PHP communicate[s] with the underlying file system as a "non-Unicode aware program"', and because of that all filenames given to PHP by Windows and vice versa are automatically translated/reencoded by Windows. This causes the errors. And you seemingly can't stop the automatic reencoding.
(And https://stackoverflow.com/a/2888039/3716796 by Artefacto: "PHP does not use the wide WIN32 API calls, so you're limited by the codepage.")
And at https://bugs.php.net/bug.php?id=47096 there is the bug report for PHP.
Though on there nicolas suggests, that a COM-object might work! $fs = new COM('Scripting.FileSystemObject', null,
CP_UTF8);
Maybe I will try that sometimes.
So there is the part of my questionleft : Is there PHP6 out, or was it withdrawn, or is there anything new on PHP about that topic?
// full Question
The most questions about this topic are 1 to 5 years old.
Could php now save a file using
file_put_contents($dir . '/' . $_POST['fileName'], $_POST['content']);
when the $_POST['fileName'] is UTF-8 encoded, for example "Крым.xml" ?
Currently it is saved as
ÐšÑ€Ñ‹Ð¼.xml
I checked the fileName variable, so I can be sure it's UTF-8:
echo mb_detect_encoding($_POST['fileName']);
Is there now anything new in PHP that could accomplish it?
At some places I read PHP 6 would be able to do it, but PHP 6 if i I remember right, has been withdrawn. ?
In Windows Explorer I can change the name of a file to "Крым.xml". As far as I have understood the old questions&answers, it should be possible to use file_put_contents if the fileName-var is simply encoded to the encoding used by windows 7 and it's NTFS disc.
There is even 3 old question with answers that claim to have succeeded: PHP File Handling with UTF-8 Special Characters
Convert UTF-16LE to UTF-8 in php
and PHP: How to create unicode filenames
Overall and most approved answers say it is not possible.
I checked all suggested answers already myself, and none works.
How to definitly and with absolute accuracy find out, in which encoding my Win 7 and Explorer saves the filename on my NTFS disc and with German language setting?
As said: I can create a file "Крым.xml" in the Explorer.
My conclusion:
1. Either file_put_contents doesn'T work correctly when handing over the fileName (which I tried with conversions to UTF-16, UTF-16LE, ISO-8859-1 and Windows-1252) to Windows,
2. or file_put_contents just doesn't implement a way to call Windows' own file function in the appropriate way (so this second possibility would mean it's not a bug but just not implemented.) (For example notepad++ has no problems creating, writing and renaming a file called Крым.xml.)
Just one example of the error messages I got, in this case when I used
mb_convert_encoding($theFilename , 'Windows-1252' , 'UTF-8')
"Warning: file_put_contents(dirToSaveIn/????.xml): failed to open stream: No error in C:\aa xampp\htdocs\myinterface.lo\myinterface\phpWriteLocalSearchResponseXML.php on line 26 "
With other conversion I got other error messages, ranging from 'invalid characters' to no string recognized at all.
Greetings
John

PHP starting with 7.1.0alpha2 supports UTF-8 filenames on Windows.
Thanks.

Files with same content differ in sizes

Here is the protocol:
1) I generate a text file online with PHP containing alphanumeric characters. Then I download it and note its size (from Properties menu).
2) I open the text file with Notepad++ and cut all the content in a new text file, then I save the new file (with the same name).
3) To my astonishment, even thought both files have the exact same text content, their size isn't the same!
--TEST 1--
Downloaded file: 1529 Ko
New copy file: 1594 Ko
--TEST 2--
Downloaded file: 52 Ko
New copy file: 54 Ko
So what? Why am I posting this here? Because the file in question is available to my users for download on my website, and they can use it to replace a file in a game's save. However, the game reacts to the new file by rejecting it, whilst the copied one (with the above protocol) works fine.
The only difference I see between both files is their size (slight difference as shown above) - but the content and the name is the same. Any idea why there is that size difference?

This will most likely be newlines that are converted between unix (1 byte) and windows (2 bytes).
As mentioned in the comments, it could also be encoding, but NotePad++ is pretty good at encoding. It's also unlikely to account for the difference.
You need to convert the "\r\n" to "\n" to get the smaller filesize. Here's a page I just found with a few options: http://darklaunch.com/2009/05/06/php-normalize-newlines-line-endings-crlf-cr-lf-unix-windows-mac
Another thingto watch for is a trailing "newline" which is not very obvious. Again, strip it out before doing your comparisson.

Is your client and server are different platforms ? Say linux and windows ? In that case there is a difference in the way new line characters are stored. This can cause size difference.
Another reason could be character encoding used but it is little less likely.

Why might my PHP log file not entirely be text?

I'm trying to debug a plugin-bloated Wordpress installation; so I've added a very simple homebrew logger that records all the callbacks, which are basically listed in a single, ultimately 250+ row multidimensional array in Wordpress (I can't use print_r() because I need to catch them right before they are called).
My logger line is $logger->log("\t" . $callback . "\n");
The logger produces a dandy text file in normal situations, but at two points during this particular task it is adding something which causes my log file to no longer be encoded properly. Gedit (I'm on Ubuntu) won't open the file, claiming to not understand the encoding. In vim, the culprit corrupt callback (which I could not find in the debugger, looking at the array) is about in the middle and printed as ^#lambda_546 and at the end of file there's this cute guy ^M. The ^M and ^# are blue in my vim, which has no color theme set for .txt files. I don't know what it means.
I tried adding an is_string($callback) condition, but I get the same results.
Any ideas?

^# is a NUL character (\0) and ^M is a CR (\r). No idea why they're being generated though. You'd have to muck through the source and database to find out. geany should be able to open the file easily enough though.

Seems these cute guys are a result of your callback formatting for windows.

Mystery over. One of the callbacks was an anonymous function. Investigating the PHP create_function documentation, I saw that a commenter had noted that the created function has a name like so: chr(0) . lambda_n. Thanks PHP.
As for the \r. Well, that is more embarrassing. My logger reused some older code that I previously written which did end lines in \r\n.

Lamp / Cakephp: Streaming an image : Binary 0x00 replaced by 0x20

I'm trying to create a script that pulls an image out of the database and displays it to the user, called by <img src="viewImage/someImageName">
But the problem I'm having is when the image is displayed all of the Nulls (0x00) are replaced by 0x20 and I have no idea why. The data in the database shows it being nulls but somewhere along the way it gets changed to 0x20s.
Does anyone have any idea? is there something I'm missing?
Here is the code I'm using:
$data = $this->Image->read(NULL, $userId);
header("Content-Type: image/jpeg");
echo($data['image']);
die;
I don't think it has anything to do with the code because as you can see there is no place for error. I can dump the binary contents out and it has not yet been tampered.
Something with the stack or cakephp any thoughts?
Update:
I've noticed that a space is making to the beginning of stream, I'm trying to track it down, could this be the problem?

Yeah, something along the way is freaking out (because OMG nulls, what if something thinks they're string terminators) and replacing them with spaces. I suspect CakePHP but am not quite certain enough to say j'accuse. Try:
header('Transfer-Encoding-Type: base64');
and see if that convinces whatever's doing it to leave your data alone.

I had a stray space in a file somewhere, lots of fun to track down :)
I guess this switches the mode of something in the stack and corrupts the files

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.