glob() can't find file names with multibyte characters on Windows? - php

I'm writing a file manager and need to scan directories and deal with renaming files that may have multibyte characters. I'm working on it locally on Windows/Apache PHP 5.3.8, with the following file names in a directory:
filename.jpg
имяфайла.jpg
file件name.jpg
פילענאַמע.jpg
文件名.jpg
Testing on a live UNIX server woked fine. Testing locally on Windows using glob('./path/*') returns only the first one, filename.jpg.
Using scandir(), the correct number of files is returned at least, but I get names like ?????????.jpg (note: those are regular question marks, not the � character.
I'll end up needing to write a "search" feature to search recursively through the entire tree for filenames matching a pattern or with a certain file extension, and I assumed glob() would be the right tool for that, rather than scan all the files and do the pattern matching and array building in the application code. I'm open to alternate suggestions if need be.
Assuming this was a common problem, I immediately searched Google and Stack Overflow and found nothing even related. Is this a Windows issue? PHP shortcoming? What's the solution: is there anything I can do?
Addendum: Not sure how related this is, but file_exists() is also returning FALSE for these files, passing in the full absolute path (using Notepad++, the php file itself is UTF-8 encoding no BOM). I'm certain the path is correct, as neighboring files without multibyte characters return TRUE.
EDIT: glob() can find a file named filename-äöü.jpg. Previously in my .htaccess file, I had AddDefaultCharset utf-8, which I didn't consider before. filename-äöü.jpg was printing as filename-���.jpg. The only effect removing that htaccess line seemed to have was now that file name prints normally.
I've deleted the .htaccess file completely, and this is my actual test script in it's entirety (I changed a couple of file names from the original post):
print_r(scandir('./uploads/'));
print_r(glob('./uploads/*'));
Output locally on Windows:
Array
(
[0] => .
[1] => ..
[2] => ??? ?????.jpg
[3] => ???.jpg
[4] => ?????????.jpg
[5] => filename-äöü.jpg
[6] => filename.jpg
[7] => test?test.jpg
)
Array
(
[0] => ./uploads/filename-äöü.jpg
[1] => ./uploads/filename.jpg
)
Output on remote UNIX server:
Array
(
[0] => .
[1] => ..
[2] => filename-äöü.jpg
[3] => filename.jpg
[4] => test이test.jpg
[5] => имя файла.jpg
[6] => פילענאַמע.jpg
[7] => 文件名.jpg
)
Array
(
[0] => ./uploads/filename-äöü.jpg
[1] => ./uploads/filename.jpg
[2] => ./uploads/test이test.jpg
[3] => ./uploads/имя файла.jpg
[4] => ./uploads/פילענאַמע.jpg
[5] => ./uploads/文件名.jpg
)
Since this is a different server, regardless of platform - configuration could be different so I'm not sure what to think, and I can't fully pin it on Windows yet (could be my PHP installation, ini settings, or Apache config). Any ideas?

It looks like the glob() function depends on how your copy of PHP was built and whether it was compiled with a unicode-aware WIN32 API (I don't believe the standard builid is.
Cf. http://www.rooftopsolutions.nl/blog/filesystem-encoding-and-php
Excerpt from comments on the article:
Philippe Verdy 2010-09-26 8:53 am
The output from your PHP installation on Windows is easy to explain :
you installed the wrong version of PHP, and used a version not
compiled to use the Unicode version of the Win32 API. For this reason,
the filesystem calls used by PHP will use the legacy "ANSI" API and so
the C/C++ libraries linked with this version of PHP will first try to
convert yout UTF-8-encoded PHP string into the local "ANSI" codepage
selected in the running environment (see the CHCP command before
starting PHP from a command line window)
Your version of Windows is MOST PROBABLY NOT responsible of this weird
thing. Actually, this is YOUR version of PHP which is not compiled
correctly, and that uses the legacy ANSI version of the Win32 API (for
compatibility with the legacy 16-bit versions of Windows 95/98 whose
filesystem support in the kernel actually had no direct support for
Unicode, but used an internal conversion layer to convert Unicode to
the local ANSI codepage before using the actual ANSI version of the
API).
Recompile PHP using the compiler option to use the UNICODE version of
the Win32 API (which should be the default today, and anyway always
the default for PHP installed on a server that will NEVER be Windows
95 or Windows 98...)
Then Windows will be able to store UTF-16 encoded filenames (including
on FAT32 volumes, even if, on these volumes, it will also generate an
aliased short name in 8.3 format using the filesystem's default
codepage, something that can be avoided in NTFS volumes).
All what you describe are problems of PHP (incorrect porting to
Windows, or incorrect system version identification at runtime) :
reread the README files coming with PHP sources explaining the
compilation flags. I really think that the makefile on Windows should
be able to configure and autodetect if it really needs to use ONLY the
ANSI version of the API. If you are compiling it for a server, make
sure that the Configure script will effectively detect the full
support of the UNICODE version of the Win32 aPI and will use it when
compiling PHP and when selecting the runtime libraries to link.
I use PHP on Windows, correctly compiled, and I absolutely DON'T know
the problems you cite in your article.
Let's forget now forever these non-UNICODE versions of the Win32
API (which are using inconsistantly the local ANSI codepage for the
Windows graphical UI, and the OEM codepage for the filesystem APIs,
the DOS/BIOS-compatible APIs, the Console APIs) : these non-Unicode
versions of the APIs are even MUCH slower and more costly than the
Unicode versions of the APIs, because they are actually translating
the codepage to Unicode before using the core Unicode APIs (the
situation on Windows NT-based kernels is exactly the reverse from the
situation on versions of Windows based on a virtual DOS extender, such
as Windows 95/98/ME).
When you don't use the native version of the API, your API call will
pass through a thunking layer that will transcode the strings between
Unicode and one of the legacy ANSI or CHCP-selected OEM codepages, or
the OEM codepage hinted on the filesystem: this requires additional
temporary memory allocation within the non-native version of the Win32
API. This takes additional time to convert things before doing the
actual work by calling the native API.
In summary: the PHP binary you install on Windows MUST be different
depending on if you compiled it for Windows 95/98/SE (or the old
Win16s emulation layer for Windows 3.x, which had a very mimimum
support of UTF-8, only to support the Unicode subsets of Unicode used
by the ANSI and OEM codapges selected when starting Windows from a DOS
extender) or if it was compiled for any other version of Windows based
on the NT kernel.
The best proof that this is a problem of PHP and not Windows, is that
your weird results will NOT occur in other languages like C#,
Javascript, VB, Perl, Ruby... PHP has a very bad history in tracking
versions (and too many historical source code quirks and wrong
assumptions that should be disabled today, and an inconsistant library
that has inherited all those quirks initially made in old versions of
PHP for old versions of Windows that are even no longer officially
supported, by Microsoft or even by PHP itself !).
In other words : RTM ! Or download and install a binary version of
PHP for Windows precompield with the correct settings : I really think
that PHP should distribute Windows binaries already compiled by
default for the Unicode version of the Win32 API, and using the
Unicode version of the C/C++ libraries : internally the PHP code will
convert its UTF-8 strings to UTF-16 before calling the Win32 API, and
back from UTF-16 to UTF-8 when retrieving Win32 results, instead of
converting PHP's internal UTF-8 strings back/to the local OEM codepage
(for the filesystem calls) or the local ANSI codepage (for all other
Win32 APIs, including the registry or process).

Try to set internal encoding inside in function (script).
setlocale(LC_ALL,'C.UTF-8');

PHP on windows does not use the Unicode API yet. So you have to use the runtime encoding (whatever it is) to be able to deal with non ascii charset.

Starting with PHP 7.1 long and UTF-8 paths on Windows are supported directly in the core.

Related

Where does PHP's NumberFormatter take the locale formats from?

Where does PHP's NumberFormatter take the locale formats from? More interested in Linux environment, if that makes any difference.
Is it compiled in, or some system resource is used? How can I view the formats for each supported locale? (locale -c -k LC_MONETARY doesn't seem to list/have the info on the pattern.) Are they modifiable per server?
If there is a mistake in a format, where can I report it or propose a fix? (E.g., the lv_LV locale has a mistake regarding the thousand separators.)
Why is the output different for HHVM – https://3v4l.org/ms1ZN ?
PHP uses ICU library (see function unum_formatDoubleCurrency in ext/intl/formatter/formatter_format.c).
ICU library, in turn, uses Common Locale Data Repository (CLDR) (see http://userguide.icu-project.org/icudata).
The format in the example (currency format for lv_LV locale), can be seen in CLDR's Survey Tool – http://st.unicode.org/cldr-apps/v#/lv/Number_Formatting_Patterns/
If there was a bug, it could be reported at http://unicode.org/cldr/trac/newticket or edited in the Survey Tool by an account acquired in this contact form: http://www.unicode.org/reporting.html
But, in the current case, there was no bug.
The format of PHP does not match CLDR data probably because of the libicu version (and its CLDR version) that is installed on the particular computer/server, or a specific data file being used (icudatl.dat, see http://userguide.icu-project.org/icudata). At the moment (2018-09), the latest libicu/data version is 62 (see http://site.icu-project.org/home) and the latest CLDR version is 34 (see http://cldr.unicode.org/).
If icu-devtools is installed, running icuinfo would display what libicu and CLDR versions are being used. In my case: <param name="version">55.1</param>[..]<param name="cldr.version">27.0.1</param>
There are two alternatives given for the currency format in lv_LV, HHVM apparently uses the other, for some reason.
Not easy to be answered if it is not documented in the official docs. However lets have a look on the NumberFormatter implementation of PHP: https://github.com/php/php-src/tree/8939c4d96b8382abe84f35e69f4f6ebd6f0f749d/ext/intl/formatter
If you are good in C you may find the correct place I did not find it instantly (if one of us does lets replace this part of the answer).
However as far as I understand the code the correct formats are retrieved from the intl package (=internationalization package, http://php.net/manual/de/book.intl.php). NumberFormatter itself is part of it.
In case you find a real bug you can propose a fix at the official PHP Bug reporting site regarding the intl package (https://bugs.php.net/).

Is there php7 printer extension/dll?

Is there any printer extension for php 7 ? Or can someone provide working solution how to print form php? Or I should use sockets for that ? I tried dll from 5.6 but it doesnt work(
There's a lot of stuff to be done to make its source code available for build into an usable extension besides the common tasks. There are some replacements that can be done and other issues involve finding what to do with no longer used variables and currently invalid syntax. It's a a very outdated, yet usefull extension. I've also tried to do it myself with no success (it builds but it doesn't work).
The easiest way I found and still use in POS (receipt) Printers is just a
system("(echo ".$TextToBePrinted.") >\\\\MachineNameOrPreferablyFixedIp\\PrinterNetworkName");
Every time a new line is needed. Null and newLine characters are indeed possible to send in a single command containing whole custom text, but it's very fastidious. For the same kind of printers there is esc-pos php library available.
Note: No matter if it is a local printer, it has to be shared and better if used as "\\127.0.0.1\PrinterNetworkName". PrinterNetworkName avoids invalid characters as same as in files, so "Generic / Text Only" has to be accessed as "\\127.0.0.1\Generic Text Only".
Simmilar alternatives for common use printers include using dosprn and fwrite or, again,system("echo ...");ing to COMn/LPTn faked or real ports or, if you don't care: create, write and system("print ..."); file having correctly associated extension for the filename you gave.
And for the hardcord-ers, you can build a binary (.exe) to listen to a custom port or calling binary with system(...); directly and print. It's no so hard on c++ using boost. The real hardcore decision would be building a dll extension (By the way: C++ windows api contains simmilar functionality for printing like php does in its extension for managing paper and font size, style and so on).
PHP and dll's with different fisrt two numbers in version won't work. So stop trying to make it happen.

Why can't PHP find long filenames?

inside a Folder I have a file, named
`111-aaaaaa aa aaaa-,._aaaaaaa; aaaaaaaa, aa aaaaaaaaaa, aaaaaaaaa aaaaaaaa. 03.01.10. 38.38 aaaaa.txt`
when I browse that directory with PHP (or trying to read that file):
var_dump(glob('MyFolder/*'));exit;
It can't find that file. What's problem?
(if I shorten the filename, then it becomes findable. I am on windows)
Windows in particular has a very short file name limit in its original Win32 API. This general problem is discussed here at SO.
At most about 260 characters can be used in an absolute path on Win32. On other platforms there are other limits, but at least 512 characters is to be expected and more is not unheard of.
(For instance, in GNU HURD, there effectively is no limit to file lengths, even though the underlying file system may impose a limit.)
However, Windows actually can have longer filenames (obviously, as you have them on your computer). This works by using a newer Windows API. Unfortunately, standard PHP does not use this API, as far as I know.
There is a modified version of PHP which makes use of this newer Windows API over at Github.
Another benefit from using that newer API is that it also supports Unicode characters in the file names.
try scandir()
it's show list file in array.
Starting with PHP 7.1 long and UTF-8 paths on Windows are supported directly in the core.
Cheers.

PHP COM (OLE) Object connecting to MS Excel

My setup: IIS 7.5, PHP 5.4, Windows 7
I've trying to create a COM Object through PHP but I continue to get access denied. I've also followed a handful of tutorials on how to "grant access" to the ISUR to create the object but to no avail. I read the installation portion relevant to COM interfacing that says:
As of PHP 5.3.15 / 5.4.5, this extension requires php_com_dotnet.dll
to be enabled inside of php.ini in order to use these functions.
Previous versions of PHP enabled these extensions by default.
You are responsible for installing support for the various COM objects
that you intend to use (such as MS Word); we don't and can't bundle
all of those with PHP.
I've enabled the php_com_dotnet.dll file within the ini file but I still can't seem to create the COM object for Excel. Then if you read the second paragraph it says that you have to install support for the various COM objects you intend to use but doesn't specify how to go about doing that.
Question: How do I install support for the MS Excel COM object?
Any help would be appreciated. I've researched this issue but haven't found very much documentation out there.
Don't do this. You're in for a world of pain trying to launch Excel in a web application, especially from PHP.
Microsoft Office apps such as Word and Excel are not designed for server side use. When you try to instantiate an Excel "COM object" you're spinning up a full instance of Excel as a separate process. That is hugely expensive and will never scale. Not only that, to add to your woes, if for whatever reason your script can't release and shut down Excel you'll end up with tens or possibly hundreds of orphaned Excel processes hanging around in memory.
Try something like: https://github.com/PHPOffice/PHPExcel if you need to read and write Excel compatible spreadsheets.

Pass Unicode string to PHP shell_exec on Windows

I have a Windows program that is capable of handling UTF16 on input. On the PHP script I have encoding declared as UTF8. How do I call shell_exec or similar functions to launch the program and pass the parameter unchanged? Or is it not possible at all?
PHP implements shell_exec using the C standard library function popen. This is a byte-oriented function.
The Windows C runtime interprets byte data to stdlib functions as representing text encoded in the current code page, by default a locale-specific code page that will never be a UTF, so unfortunately you can't reliably get Unicode down that path.
You could try running the app with the code page changed to 65001, the Windows code page that should be UTF-8. However there are a number of stdlib bugs that make working in code page 65001 unreliable, so chances are it won't work. And if you're running in a web server, fiddling with process-global locale settings is a dicey prospect.
This is a problem with all tools that use C stdlib features, which is almost all scripting languages. Only reliable way to interact with Unicode in args or envvars, when you detect you're running under Windows, is to use the native Win32 API functions instead. On PHP it looks like you might be able to do this using w32api_invoke_function to call CreateProcessW. (Haven't done it myself, but the same strategy works with Python using ctypes.)
Alternatively, pass data through the stdin/stdout streams. Then you can read them as bytes and do any Unicode conversions yourself manually.

Categories