PHP7 UTF-8 filenames on Windows server, new phenomenon caused by ZipArchive - php

Update:
Preparing a bug report to the great people that make PHP 7 possible I revised my research once more and tried to melt it down to a few simple lines of code. While doing this I found that PHP itself is not the cause of the problem. I will share my results here when I'm done. Just so you know and don't possibly waste your time or something :)
Synopsis: PHP7 now seems able to write UTF-8 filenames but is unable to access them?
Preamble: I read about 10-15 articles here touching the subject but they did not help me solve the problem and they all are older than the PHP7 release. It seems to me that this is probably a new issue and I wonder if it might be a bug. I spent a lot of time experimenting with en-/decoding of the strings and trying to figure out a way to make it work - to no avail.
Good day everybody and greetings from Germany (insert shy not-my-native-language-remark here), I hope you can help me out with this new phenomenon I encountered. It seems to be "new" in the sense that it came with PHP 7.
I think most people working with PHP on a Windows system are very familiar with the problem of filenames and the transparent wrapper of PHP that manages access to files that have non-ASCII filenames (or windows-1252 or whatever is the system code page).
I'm not quite sure how to approach the subject and as you can see I'm not very experienced in composing questions so please don't rip my head off instantly. And yes I will strive to keep it short. Here we go:
First symptom: after updating to PHP7 I sometimes encountered problems with accessing files generated by my software. Sometimes it worked as usual, sometimes not. I found out the difference was that PHP7 now seems able to write UTF-8 filenames but is unable to access files with those names.
After generating said files on two separate "identical" systems (differing only in the PHP version) this is how the files are named on the hard drive:
PHP 5.5: Lokaltest_KG_漢字_汉字_Krümhold-DEZ1604-140081-complete.zip
PHP 7: Lokaltest_KG_漢字_汉字_Krümhold-DEZ1604-140081-complete.zip
Splendid, PHP 7 is capable of writing unicode-filenames on the HDD, and UTF-16 is used on windows afaik. Now the downside is that when I try to access those files for example with is_file() PHP 5.5 works but PHP 7 does not.
Consider this code snippet (note: I "hacked" into this function because it was the simplest way, it was not written for this purpose). This function gets called after a zip-file gets generated taking on the name of the customer and other values to determine a proper name. Those come out of the database. Database and internal encoding of PHP are both UTF-8. clearstatcache is per se not necessary but I included it to make things clearer. Important: Everything that happens is done with PHP7, no other entity is responsible for creating the zip-file. To be precise it is done with class ZipArchive. Actually it does not even matter that it is a zip-archive, the point is that the filename and the content of the file are created by PHP7 - successfully.
public static function downloadFileAsStream( $file )
{
clearstatcache();
print $file . "<br/>";
var_dump(is_file($file));
die();
}
Output is:
D:/htdocs/otm/.data/_tmp/Lokaltest_KG_漢字_汉字_Krümhold-DEZ1604-140081-complete.zip
bool(false)
So PHP7 is able to generate the file - they indeed DO exist on the harddrive and are legit and accessible and all - but is incapable of accessing them. is_file is not the only function that fails, file_exists() does too for example.
A little experiment with encoding conversion to give you a taste of the things I tried:
public static function downloadFileAsStream( $file )
{
clearstatcache();
print $file . "<br/>";
print mb_detect_encoding($file, 'ASCII,UTF-16,windows-1252,UTF-8', false) . "<br/>";
print mb_detect_encoding($file, 'ASCII,UTF-16,windows-1252,UTF-8', true) . "<br/>";
if (($detectedEncoding = mb_detect_encoding($file, 'ASCII,UTF-16,windows-1252,UTF-8', true)) != 'windows-1252')
{
$file = mb_convert_encoding($file, 'UTF-16', $detectedEncoding);
}
print $file . "<br/>";
var_dump(is_file($file));
die();
}
Output is:
D:/htdocs/otm/.data/_tmp/Lokaltest_KG_漢字_汉字_Krümhold-DEZ1604-140081-complete.zip
UTF-8
UTF-8
D:/htdocs/otm/.data/_tmp/Lokaltest_KG_o"[W_lI[W_Kr�mhold-DEZ1604-140081-complete.zip
NULL
So converting from UTF-8 (database/internal encoding) to UTF-16 (windows file system) does not seem to work either.
I am at the end of my rope here and sadly the issue is very important to us since we cannot update our systems with this problem looming in the background. I hope somebody can shed a little light on this. Sorry for the long post, I'm not sure how well I could get my point across.
Addition:
$file = utf8_decode($file);
var_dump(is_file($file));
die();
Delivers false for the filename with the japanese letters. When I change the input used to create the filename so that the filename now is Lokaltest_KG_Krümhold-DEZ1604-140081-complete.zip above code delivers true. So utf8_decode helps but only with a small part of unicode, german umlauts?

Answering my own question here: The actual bad boy was the component ZipArchive which created files with incorrectly encoded filenames. I have written a hopefully helpful bug report: https://bugs.php.net/bug.php?id=72200
Consider this short script:
print "php default_charset: ".ini_get('default_charset')."\n"; // just 4 info (UTF-8)
$filename = "bugtest_müller-lüdenscheid.zip"; // just an example
$filename = utf8_encode($filename); // simulating my database delivering utf8-string
$zip = new ZipArchive();
if( $zip->open($filename, ZipArchive::CREATE | ZipArchive::OVERWRITE) === true )
{
$zip->addFile('bugtest.php', 'bugtest.php'); // copy of script file itself
$zip->close();
}
var_dump( is_file($filename) ); // delivers ?
output:
output PHP 5.5.35:
php default_charset: UTF-8
bool(true)
output PHP 7.0.6:
php default_charset: UTF-8
bool(false)

Related

PHP obfuscator?

I have been using PHP Desktop which works great however if i want to share a project i did not want users to see everything in the code.
I tried the suggested code protectors but nothing seems to work for the current version. I found a simple PHP obfuscator code but it gives an error. it also generates some output but fails to echo a result.
The error:
Warning: php_strip_whitespace(): failed to open stream: No error in C:\xampp\htdocs\PHP Obfuscator\Obfus.php on line 11
The code:
<?php
//$infile = file_get_contents("Input.php");
$infile = '<?php echo "Hello World 123"; ?>';
$outfile = "Output.php";
echo "Processing $infile to $outfile\n";
$data="ob_end_clean();?>";
$data.=php_strip_whitespace($infile); // Remove whitespace
$data.=gzcompress($data,9); // Compress data
$data=base64_encode($data); // Encode in base64
// Generate output text
$out='<?ob_start();$a=\''.$data.'\';eval(gzuncompress(base64_decode($a)));$v=ob_get_contents();ob_end_clean();?>';
// Write output text
//file_put_contents($outfile,$out);
echo $out;
?>
Does anyone know how to fix this code to make reading the PHP harder for regular users that would download the exe?
It would be to prevent non coders only as it would be packed in a exe, i know it's not a secure method to hide code sources.
Also does anyone have blenc etc working with the current version? I had no luck even after following the tutorial.
The argument for php_strip_whitespace() has to be a file name, not a raw string. Write the data to a temporary file, then clean it, then delete the temporary file when you're done.
In any case, you're going about this all wrong. Security through obfuscation isn't really security at all. Any competent programmer will recognize the base64 encoding, and it's trivial to decompress the compressed data. Then, a decent IDE could restore the missing whitespace with a couple of keystrokes. Besides, your code, with its eval(gzuncompress(base64_decode($a))), literally tells the user what you did to obfuscate the code in the first place.
If you don't want users to access the source, don't distribute the source, period. Use an API or a compiler, not an obfuscator.

Use PHP to write a file to Windows that contains Japanese characters in the filename

I want to save a file to Windows using Japanese characters in the filename.
The PHP file is saved with UTF-8 encoding
<?php
$oldfile = "test.txt";
$newfile = "日本語.txt";
copy($oldfile,$newfile);
?>
The file copies, but appears in Windows as
日本語.txt
How do I make it save as
日本語.txt
?
I have ended up using the php-wfio extension from https://github.com/kenjiuno/php-wfio
After putting php_wfio.dll into php\ext folder and enabling the extension, I prefixed the filenames with wfio:// (both need to be prefixed or you get a Cannot rename a file across wrapper types error)
My test code ends up looking like
<?php
$oldfile = "wfio://test.txt";
$newfile = "wfio://日本語.txt";
copy($oldfile,$newfile);
?>
and the file gets saved in Windows as 日本語.txt which is what I was looking for
Starting with PHP 7.1, i would link you to this answer https://stackoverflow.com/a/38466772/3358424 . Unfortunately, the most of the recommendations are not valid, that are listed in the answer that strives to be the only correct one. Like "just urlencode the filename" or "FS expects iso-8859-1", etc. are terribly wrong assumptions that misinform people. That can work by luck but are only valid for US or almost western codepages, but are otherwise just wrong. PHP 7.1 + default_charset=UTF-8 is what you want. With earlier PHP versions, wfio or wrappers to ext/com_dotnet might be indeed helpful.
Thanks.

Is it possible to change the behavior of PHP's print_r function [duplicate]

This question already has answers here:
making print_r use PHP_EOL
(5 answers)
Closed 6 years ago.
I've been coding in PHP for a long time (15+ years now), and I usually do so on a Windows OS, though most of the time it's for execution on Linux servers. Over the years I've run up against an annoyance that, while not important, has proved to be a bit irritating, and I've gotten to the point where I want to see if I can address it somehow. Here's the problem:
When coding, I often find it useful to output the contents of an array to a text file so that I can view it's contents. For example:
$fileArray = file('path/to/file');
$faString = print_r($fileArray, true);
$save = file_put_contents('fileArray.txt', $faString);
Now when I open the file fileArray.txt in Notepad, the contents of the file are all displayed on a single line, rather than the nice, pretty structure seen if the file were opened in Wordpad. This is because, regardless of OS, PHP's print_r function uses \n for newlines, rather than \r\n. I can certainly perform such replacement myself by simply adding just one line of code to make the necessary replacements, ans therein lies the problem. That one, single line of extra code translates back through my years into literally hundreds of extra steps that should not be necessary. I'm a lazy coder, and this has become unacceptable.
Currently, on my dev machine, I've got a different sort of work-around in place (shown below), but this has it's own set of problems, so I'd like to find a way to "coerce" PHP into putting in the "proper" newline characters without all that extra code. I doubt that this is likely to be possible, but I'll never find out if I never ask, so...
Anyway, my current work-around goes like this. I have, in my PHP include path, a file (print_w.php) which includes the following code:
<?php
function print_w($in, $saveToString = false) {
$out = print_r($in, true);
$out = str_replace("\n", "\r\n", $out);
switch ($saveToString) {
case true: return $out;
default: echo $out;
}
}
?>
I also have auto_prepend_file set to this same file in php.ini, so that it automatically includes it every time PHP executes a script on my dev machine. I then use the function print_w instead of print_r while testing my scripts. This works well, so long as when I upload a script to a remote server I make sure that all references to the function print_w are removed or commented out. If I miss one, I (of course) get a fatal error, which can prove more frustrating than the original problem, but I make it a point to carefully proofread my code prior to uploading, so it's not often an issue.
So after all that rambling, my question is, Is there a way to change the behavior of print_r (or similar PHP functions) to use Windows newlines, rather than Linux newlines on a Windows machine?
Thanks for your time.
Ok, after further research, I've found a better work-around that suite my needs, and eliminates the need to call a custom function instead of print_r. This new work-around goes like this:
I still have to have an included file (I've kept the same name so as not to have to mess with php.ini), and php.ini still has the auto_prepend_file setting in place, but the code in print_w.php is changes a bit:
<?php
rename_function('print_r', 'print_rw');
function print_r($in, $saveToString = false) {
$out = print_rw($in, true);
$out = str_replace("\n", "\r\n", $out);
switch ($saveToString) {
case true: return $out;
default: echo $out;
}
}
?>
This effectively alters the behavior of the print_r function on my local machine, without my having to call custom functions, and having to make sure that all references to that custom function are neutralized. By using PHP's rename_function I was able to effectively rewrite how print_r behaves, making it possible to address my problem.

UTF-8, PHP, Win7 - Is there a solution now to save UTF-8-filenames on Win 7 using php?

Update: Just to not make you reading through all: PHP starting with
7.1.0alpha2 supports UTF-8 filenames on Windows. (Thanks to Anatol-Belski!)
Following some link chains on stackoverflow I found part of the answer:
https://stackoverflow.com/a/10138133/3716796 by Umberto Salsi
(and on the same question: https://stackoverflow.com/a/2950046/3716796 by Artefacto)
In short: 'PHP communicate[s] with the underlying file system as a "non-Unicode aware program"', and because of that all filenames given to PHP by Windows and vice versa are automatically translated/reencoded by Windows. This causes the errors. And you seemingly can't stop the automatic reencoding.
(And https://stackoverflow.com/a/2888039/3716796 by Artefacto: "PHP does not use the wide WIN32 API calls, so you're limited by the codepage.")
And at https://bugs.php.net/bug.php?id=47096 there is the bug report for PHP.
Though on there nicolas suggests, that a COM-object might work! $fs = new COM('Scripting.FileSystemObject', null,
CP_UTF8);
Maybe I will try that sometimes.
So there is the part of my questionleft : Is there PHP6 out, or was it withdrawn, or is there anything new on PHP about that topic?
// full Question
The most questions about this topic are 1 to 5 years old.
Could php now save a file using
file_put_contents($dir . '/' . $_POST['fileName'], $_POST['content']);
when the $_POST['fileName'] is UTF-8 encoded, for example "Крым.xml" ?
Currently it is saved as
Крым.xml
I checked the fileName variable, so I can be sure it's UTF-8:
echo mb_detect_encoding($_POST['fileName']);
Is there now anything new in PHP that could accomplish it?
At some places I read PHP 6 would be able to do it, but PHP 6 if i I remember right, has been withdrawn. ?
In Windows Explorer I can change the name of a file to "Крым.xml". As far as I have understood the old questions&answers, it should be possible to use file_put_contents if the fileName-var is simply encoded to the encoding used by windows 7 and it's NTFS disc.
There is even 3 old question with answers that claim to have succeeded: PHP File Handling with UTF-8 Special Characters
Convert UTF-16LE to UTF-8 in php
and PHP: How to create unicode filenames
Overall and most approved answers say it is not possible.
I checked all suggested answers already myself, and none works.
How to definitly and with absolute accuracy find out, in which encoding my Win 7 and Explorer saves the filename on my NTFS disc and with German language setting?
As said: I can create a file "Крым.xml" in the Explorer.
My conclusion:
1. Either file_put_contents doesn'T work correctly when handing over the fileName (which I tried with conversions to UTF-16, UTF-16LE, ISO-8859-1 and Windows-1252) to Windows,
2. or file_put_contents just doesn't implement a way to call Windows' own file function in the appropriate way (so this second possibility would mean it's not a bug but just not implemented.) (For example notepad++ has no problems creating, writing and renaming a file called Крым.xml.)
Just one example of the error messages I got, in this case when I used
mb_convert_encoding($theFilename , 'Windows-1252' , 'UTF-8')
"Warning: file_put_contents(dirToSaveIn/????.xml): failed to open stream: No error in C:\aa xampp\htdocs\myinterface.lo\myinterface\phpWriteLocalSearchResponseXML.php on line 26 "
With other conversion I got other error messages, ranging from 'invalid characters' to no string recognized at all.
Greetings
John
PHP starting with 7.1.0alpha2 supports UTF-8 filenames on Windows.
Thanks.

After update to 5.4, fopen can't read file

I have a website on a host that recently switched from PHP 5.2 to 5.4, and required us to chose a new php.ini file: 5.4 plain, 5.4 solo (just one php.ini file used throughout the site), and 5.4 fast.
I do not know which one I was using prior to making the switch, but when I did, (I chose 5.4 solo), I noticed that a part of my website that depends on mbstring (multibyte characters) no longer works.
In specific, it opens a text file that is full of characters and then that is used in an encryption script and it stores garbage in the mysql database. Then to retrieve it, it's again run through the script and decrypted, and displayed on the screen.
This worked just fine until the 5.4 change. Now it appears that it's unable to retrieve (open?) the text file. I have tested this with a non-multibyte character version and that works fine, so I don't think the issue is with the code, but rather with the way PHP is treating multibyte chars...and I suspect, just a hunch, that this is fixable by tweaking the PHP.ini file somehow. Zend.multibyte seems to be PHP's new thing.
My problem is that I have no idea what to tweak. I tried several different Zend.multibyte/mbstring combos and that didn't work.
I know that everything works up until a string is sent for encryption. It comes back as a null value, instead of a garbled string. I feel like something in the string is being rejected by PHP and thus it's failing...offering nothing instead of the string it should.
Does anyone have a thought as to what might be happening and why my script no-longer works with 5.4? I have checked and the mbstring module IS loaded, with default values in the php.ini.
Any suggestions would be great...I'm totally stumped. Even some additional reports or ways to test or narrow down the problem would be fantastic.
Thank you!
Here is some code, where I think the problem is:
$this->s1 = "";
$s1array = array("a1.txt", "a2.txt", "a3.txt");
foreach ($s1array as $i => $value) {
$myFile = "../a/dir/somewhere/$s1array[$i]";
$fh = fopen($myFile, 'r');
$theData = fgets($fh);
fclose($fh);
$this->s1 .= html_entity_decode($theData, ENT_NOQUOTES, 'UTF-8');
}
The files ../a/dir/somewhere/a1.txt and ../a/dir/somewhere/a2.txt (etc) are semi-comma delimited strings of html coded letters, for example: & #x0fb0f;& #x02c97;& #x00436;& #x10833;& #x00514; (I added the spaces so it would show code not the HTML values!).
But I guess now, for some reason, this above code isn't returning any results. If I assign the result to a variable and echo that variable, there's nothing. But if I assign $this->s1 = "abcde"; or a longer string and skip the "foreach" part, it will work. So something in this process, this code, no longer works in 5.4. Can anyone tell what's going on here? Thank you!
Why you use fopen and so on for text files when you could use file_put_contents and file_get_contents - they are mostly wrappers for fopen, freads and so on. I have NEVER ever had any problems with UTF8 using that two functions.
Also make sure everything (from php, to db if you are using it, and php files) are encoded or using utf8. There is nothing funnier than *.php files in for example latin2 and all the rest in utf8.

Categories