Do filepaths need to be English - php

I'm trying to verify a directory exists with PHP:
is_dir('C:\Users\Администратор\Desktop\Среда чтения')
But the result is always false. Do I have to name a directory in English for PHP to correctly work with them?

try to use utf-8 in your script
also check slashes

On windows
The filesystem is always UCS-2, unfortunately PHP is not so smart. I'm not really sure if the is_dir() reduces to an ANSI API call or WideString, but it would make sense to go with ANSI. In that case you're at the mercy of the "Language for Non-Unicode programs" OS setting. Filenames in the wrong languages will be inaccessible for you.
On Linux
It's not so straightforward. The filesystem itself doesn't really have a certain text encoding, which makes things awkward. A Cyrillic filename can be stored in UTF-8 or Windows-1252 (or whatever else), and it's up to the software that creates/reads the files to recognize what the encoding was. The filesystem just stores a bunch of bytes as the "filename". PHP also doesn't care about text encodings either, so you really need to know what the encoding of the filename is beforehand, so that you can pass the correct string to is_dir().
In summary
I highly recommend steering clear of non-English characters in filenames when using PHP. It's damn hard to get it right.

You can just check if file_exists():
if(file_exists('C:\Users\Administrator\Desktop\Wednesday read'))
{
// Do your thing...
}

With a specific example, you can look at the problem the other way around:
What dirs exist
$dirs = scandir('C:\Users');
print_r($dirs);
Since you know there is a folder named "Администратор" - see how php displays it. By taking the result that php receives, you can hopefully determine the correct encoding to the specific folder. If the encoding is consistent (which according to Vilx- it is) it should be possible to handle any folders/files with cyrillic characters.

Don't use Administrator rights!
Use UTF-8.
Use linux, at least in VM. It will save you a lot of time.
You should NEVER rely on non-ASCII paths!
Use file_exists() function to test if file/directory exists: http://php.net/manual/en/function.file-exists.php

This problem may be caused because you didn't 'escape' the backslashes, therefore, PHP tries to do this:
is_dir('C:UsersАдминистраторDesktopСреда чтения')
Which doesn't work.
Try escaping your back-slashes;
is_dir('C:\\Users\\Администратор\\Desktop\\Среда чтения')
Although using 'slashes' also works on PHP on windows;
is_dir('C:/Users/Администратор/Desktop/Среда чтения')

Related

Cannot read local file with accented characters using PHP file_get_contents [duplicate]

For example I have a filename like this - проба.xml and I am unable to open it from PHP script.
If I setup php script to be in utf-8 than all the text in script is utf-8 thus when I pass this to file_get_contents:
$fname = "проба.xml";
file_get_contents($fname);
I get error that file does not exist. The reason for this is that in Windows (XP) all file names with non-latin characters are unicode (UTF-16). OK so I tried this:
$fname = "проба.xml";
$res = mb_convert_encoding($fname,'UTF-8','UTF-16');
file_get_contents($res);
But the error persists since file_get_contents can not accept unicode strings...
Any suggestions?
UPDATE (July 13 '17)
Although the docs do not seem to mention it, PHP 7.0 and above finally supports Unicode filenames on Windows out of the box. PHP's Filesystem APIs accept and return filenames according to default_charset, which is UTF-8 by default.
Refer to bug fix here: https://github.com/php/php-src/commit/3d3f11ede4cc7c83d64cc5edaae7c29ce9c6986f
UPDATE (Jan 29 '15)
If you have access to the PHP extensions directory, you can try installing php-wfio.dll at https://github.com/kenjiuno/php-wfio, and refer to files via the wfio:// protocol.
file_get_contents("wfio://你好.xml");
Original Answer
PHP on Windows uses the Legacy "ANSI APIs" exclusively for local file access, which means PHP uses the System Locale instead of Unicode.
To access files whose filenames contain Unicode, you must convert the filename to the specified encoding for the current System Locale. If the filename contains characters that are not representable in the specified encoding, you're out of luck (Update: See section above for a solution). scandir will return gibberish for these files and passing the string back in fopen and equivalents will fail.
To find the right encoding to use, you can get the system locale by calling <?=setlocale(LC_TYPE,0)?>, and looking up the Code Page Identifier (the number after the .) at the MSDN Article https://msdn.microsoft.com/en-us/library/dd317756(VS.85).aspx.
For example, if the function returns Chinese (Traditional)_HKG.950, this means that the 950 codepage is in use and the filename should be converted to the big-5 encoding. In that case, your code will have to be as follows, if your file is saved in UTF-8 (preferrably without BOM):
$fname = iconv('UTF-8','big-5',"你好.xml");
file_get_contents($fname);
or as follows if you directly save the file as Big-5:
$fname = "你好.xml";
file_get_contents($fname);
You could try:
getting the string for the filename from a directory listing using opendir and readdir
passing that string to file_get _contents to see if that will work, or
try getting the content of the file using fopen, fread and fclose
Hope this helps!
These are conclusions so far:
PHP 5 can not open filename with unicode characters unless the source filename is unicode.
PHP 5 (at least on windows XP) is not able to process PHP source in unicode.
Thus the conclusion this not doable in PHP 5.

utf8_encode does not encode special characters ě/š/č/ř/ž/ý/á, etc

I have the following problem which seems to have no solution and I am absolutely disgusted.
I have Android application where users can upload file to my server and then they can access them. So if user opens his account, this application uses function scandir() and on my server I use method json_encode() to send data to my app to shows him his files and folders. And here is the problem:
If some user for example uploads file with special characters (Válcování stupHovitých vzorko za tepla.pptx) and this file is not utf-8 encoded, then I can't pass it via json_encode, because I get UTF-8 error. So I tried to use method uf8_encode() on each file name and it worked, BUT if there is some file or folder with special characters like č/š/ě/ř/ž/á/ý/í/é, etc. and use method utf8_encode() on it then I get some mess in my application and instead of getting folder with name č, I get name Ä.
I tried nearly everything from htmlspecialchars() to iconv(), but I can't find a method which returns me files and folders on my server with proper names.
Yes, it does not. The doc reads:
utf8_encode — Encodes an ISO-8859-1 string to UTF-8
Not sure what encoding it is, but it's definitely not ISO-8859-1.
You need to use mb-convert-encoding to convert between arbitrary encodings. E.g.
$utfStr = mb-convert-encoding('č/š/ě/ř/ž/á/ý/í/é', 'UTF-8', 'ISO-8859-15')
If you don't know client's encoding, you may need to use mb_detect_encoding, which may not always work, or be exactly accurate.
To avoid this mess, I would recommend to do it other way round and send utf-encoded file name from your android app, rather than convert it serverside.

PHP file handling [duplicate]

I can't use mkdir to create folders with UTF-8 characters:
<?php
$dir_name = "Depósito";
mkdir($dir_name);
?>
when I browse this folder in Windows Explorer, the folder name looks like this:
Depósito
What should I do?
I'm using php5
Just urlencode the string desired as a filename. All characters returned from urlencode are valid in filenames (NTFS/HFS/UNIX), then you can just urldecode the filenames back to UTF-8 (or whatever encoding they were in).
Caveats (all apply to the solutions below as well):
After url-encoding, the filename must be less that 255 characters (probably bytes).
UTF-8 has multiple representations for many characters (using combining characters). If you don't normalize your UTF-8, you may have trouble searching with glob or reopening an individual file.
You can't rely on scandir or similar functions for alpha-sorting. You must urldecode the filenames then use a sorting algorithm aware of UTF-8 (and collations).
Worse Solutions
The following are less attractive solutions, more complicated and with more caveats.
On Windows, the PHP filesystem wrapper expects and returns ISO-8859-1 strings for file/directory names. This gives you two choices:
Use UTF-8 freely in your filenames, but understand that non-ASCII characters will appear incorrect outside PHP. A non-ASCII UTF-8 char will be stored as multiple single ISO-8859-1 characters. E.g. ó will be appear as ó in Windows Explorer.
Limit your file/directory names to characters representable in ISO-8859-1. In practice, you'll pass your UTF-8 strings through utf8_decode before using them in filesystem functions, and pass the entries scandir gives you through utf8_encode to get the original filenames in UTF-8.
Caveats galore!
If any byte passed to a filesystem function matches an invalid Windows filesystem character in ISO-8859-1, you're out of luck.
Windows may use an encoding other than ISO-8859-1 in non-English locales. I'd guess it will usually be one of ISO-8859-#, but this means you'll need to use mb_convert_encoding instead of utf8_decode.
This nightmare is why you should probably just transliterate to create filenames.
Under Unix and Linux (and possibly under OS X too), the current file system encoding is given by the LC_CTYPE locale parameter (see function setlocale()). For example, it may evaluate to something like en_US.UTF-8 that means the encoding is UTF-8. Then file names and their paths can be created with fopen() or retrieved by dir() with this encoding.
Under Windows, PHP operates as a "non-Unicode aware program", then file names are converted back and forth from the UTF-16 used by the file system (Windows 2000 and later) to the selected "code page". The control panel "Regional and Language Options", tab panel "Formats" sets the code page retrieved by the LC_CTYPE option, while the "Administrative -> Language for non-Unicode Programs" sets the translation code page for file names. In western countries the LC_CTYPE parameter evaluates to something like language_country.1252 where 1252 is the code page, also known as "Windows-1252 encoding" which is similar (but not exactly equal) to ISO-8859-1. In Japan the 932 code page is usually set instead, and so on for other countries. Under PHP you may create files whose name can be expressed with the current code page. Vice-versa, file names and paths retrieved from the file system are converted from UTF-16 to bytes using the "best-fit" current code page.
This mapping is approximated, so some characters might be mangled in an unpredictable way. For example, Caffé Brillì.txt would be returned by dir() as the PHP string Caff\xE9 Brill\xEC.txt as expected if the current code page is 1252, while it would return the approximate Caffe Brilli.txt on a Japanese system because accented vowels are missing from the 932 code page and then replaced with their "best-fit" non-accented vowels. Characters that cannot be translated at all are retrieved as ? (question mark). In general, under Windows there is no safe way to detect such artifacts.
More details are available in my reply to the PHP bug no. 47096.
PHP 7.1 supports UTF-8 filenames on Windows disregarding the OEM codepage.
The problem is that Windows uses utf-16 for filesystem strings, whereas Linux and others use different character sets, but often utf-8. You provided a utf-8 string, but this is interpreted as another 8-bit character set encoding in Windows, maybe Latin-1, and then the non-ascii character, which is encoded with 2 bytes in utf-8, is handled as if it was 2 characters in Windows.
A normal solution is to keep your source code 100% in ascii, and to have strings somewhere else.
Using the com_dotnet PHP extension, you can access Windows' Scripting.FileSystemObject, and then do everything you want with UTF-8 files/folders names.
I packaged this as a PHP stream wrapper, so it's very easy to use :
https://github.com/nicolas-grekas/Patchwork-UTF8/blob/lab-windows-fs/class/Patchwork/Utf8/WinFsStreamWrapper.php
First verify that the com_dotnet extension is enabled in your php.ini
then enable the wrapper with:
stream_wrapper_register('win', 'Patchwork\Utf8\WinFsStreamWrapper');
Finally, use the functions you're used to (mkdir, fopen, rename, etc.), but prefix your path with win://
For example:
<?php
$dir_name = "Depósito";
mkdir('win://' . $dir_name );
?>
You could use this extension to solve your issue: https://github.com/kenjiuno/php-wfio
$file = fopen("wfio://多国語.txt", "rb"); // in UTF-8
....
fclose($file);
Try CodeIgniter Text helper from this link
Read about convert_accented_characters() function, it can be costumised
My set of tools to use filesystem with UTF-8 on windows OR linux via PHP and compatible with .htaccess check file exists:
function define_cur_os(){
//$cur_os=strtolower(php_uname());
$cur_os=strtolower(PHP_OS);
if(substr($cur_os, 0, 3) === 'win'){
$cur_os='windows';
}
define('CUR_OS',$cur_os);
}
function filesystem_encode($file_name=''){
$file_name=urldecode($file_name);
if(CUR_OS=='windows'){
$file_name=iconv("UTF-8", "ISO-8859-1//TRANSLIT", $file_name);
}
return $file_name;
}
function custom_mkdir($dir_path='', $chmod=0755){
$dir_path=filesystem_encode($dir_path);
if(!is_dir($dir_path)){
if(!mkdir($dir_path, $chmod, true)){
//handle mkdir error
}
}
return $dir_path;
}
function custom_fopen($dir_path='', $file_name='', $mode='w'){
if($dir_path!='' && $file_name!=''){
$dir_path=custom_mkdir($dir_path);
$file_name=filesystem_encode($file_name);
return fopen($dir_path.$file_name, $mode);
}
return false;
}
function custom_file_exists($file_path=''){
$file_path=filesystem_encode($file_path);
return file_exists($file_path);
}
function custom_file_get_contents($file_path=''){
$file_path=filesystem_encode($file_path);
return file_get_contents($file_path);
}
Additional resources
special characters in "file_exists" problem (php)
PHP file_exists with accent returns false
http://www.developpez.net/forums/d825883/php/php-sgbd/php-mysql/mkdir-accents/
http://en.wikipedia.org/wiki/Uname#Table_of_standard_uname_output
I don't need to write much, it works well:
<?php
$dir_name = mb_convert_encoding("Depósito", "ISO-8859-1", "UTF-8");
mkdir($dir_name);
?>

php how can I create russian folder [duplicate]

I can't use mkdir to create folders with UTF-8 characters:
<?php
$dir_name = "Depósito";
mkdir($dir_name);
?>
when I browse this folder in Windows Explorer, the folder name looks like this:
Depósito
What should I do?
I'm using php5
Just urlencode the string desired as a filename. All characters returned from urlencode are valid in filenames (NTFS/HFS/UNIX), then you can just urldecode the filenames back to UTF-8 (or whatever encoding they were in).
Caveats (all apply to the solutions below as well):
After url-encoding, the filename must be less that 255 characters (probably bytes).
UTF-8 has multiple representations for many characters (using combining characters). If you don't normalize your UTF-8, you may have trouble searching with glob or reopening an individual file.
You can't rely on scandir or similar functions for alpha-sorting. You must urldecode the filenames then use a sorting algorithm aware of UTF-8 (and collations).
Worse Solutions
The following are less attractive solutions, more complicated and with more caveats.
On Windows, the PHP filesystem wrapper expects and returns ISO-8859-1 strings for file/directory names. This gives you two choices:
Use UTF-8 freely in your filenames, but understand that non-ASCII characters will appear incorrect outside PHP. A non-ASCII UTF-8 char will be stored as multiple single ISO-8859-1 characters. E.g. ó will be appear as ó in Windows Explorer.
Limit your file/directory names to characters representable in ISO-8859-1. In practice, you'll pass your UTF-8 strings through utf8_decode before using them in filesystem functions, and pass the entries scandir gives you through utf8_encode to get the original filenames in UTF-8.
Caveats galore!
If any byte passed to a filesystem function matches an invalid Windows filesystem character in ISO-8859-1, you're out of luck.
Windows may use an encoding other than ISO-8859-1 in non-English locales. I'd guess it will usually be one of ISO-8859-#, but this means you'll need to use mb_convert_encoding instead of utf8_decode.
This nightmare is why you should probably just transliterate to create filenames.
Under Unix and Linux (and possibly under OS X too), the current file system encoding is given by the LC_CTYPE locale parameter (see function setlocale()). For example, it may evaluate to something like en_US.UTF-8 that means the encoding is UTF-8. Then file names and their paths can be created with fopen() or retrieved by dir() with this encoding.
Under Windows, PHP operates as a "non-Unicode aware program", then file names are converted back and forth from the UTF-16 used by the file system (Windows 2000 and later) to the selected "code page". The control panel "Regional and Language Options", tab panel "Formats" sets the code page retrieved by the LC_CTYPE option, while the "Administrative -> Language for non-Unicode Programs" sets the translation code page for file names. In western countries the LC_CTYPE parameter evaluates to something like language_country.1252 where 1252 is the code page, also known as "Windows-1252 encoding" which is similar (but not exactly equal) to ISO-8859-1. In Japan the 932 code page is usually set instead, and so on for other countries. Under PHP you may create files whose name can be expressed with the current code page. Vice-versa, file names and paths retrieved from the file system are converted from UTF-16 to bytes using the "best-fit" current code page.
This mapping is approximated, so some characters might be mangled in an unpredictable way. For example, Caffé Brillì.txt would be returned by dir() as the PHP string Caff\xE9 Brill\xEC.txt as expected if the current code page is 1252, while it would return the approximate Caffe Brilli.txt on a Japanese system because accented vowels are missing from the 932 code page and then replaced with their "best-fit" non-accented vowels. Characters that cannot be translated at all are retrieved as ? (question mark). In general, under Windows there is no safe way to detect such artifacts.
More details are available in my reply to the PHP bug no. 47096.
PHP 7.1 supports UTF-8 filenames on Windows disregarding the OEM codepage.
The problem is that Windows uses utf-16 for filesystem strings, whereas Linux and others use different character sets, but often utf-8. You provided a utf-8 string, but this is interpreted as another 8-bit character set encoding in Windows, maybe Latin-1, and then the non-ascii character, which is encoded with 2 bytes in utf-8, is handled as if it was 2 characters in Windows.
A normal solution is to keep your source code 100% in ascii, and to have strings somewhere else.
Using the com_dotnet PHP extension, you can access Windows' Scripting.FileSystemObject, and then do everything you want with UTF-8 files/folders names.
I packaged this as a PHP stream wrapper, so it's very easy to use :
https://github.com/nicolas-grekas/Patchwork-UTF8/blob/lab-windows-fs/class/Patchwork/Utf8/WinFsStreamWrapper.php
First verify that the com_dotnet extension is enabled in your php.ini
then enable the wrapper with:
stream_wrapper_register('win', 'Patchwork\Utf8\WinFsStreamWrapper');
Finally, use the functions you're used to (mkdir, fopen, rename, etc.), but prefix your path with win://
For example:
<?php
$dir_name = "Depósito";
mkdir('win://' . $dir_name );
?>
You could use this extension to solve your issue: https://github.com/kenjiuno/php-wfio
$file = fopen("wfio://多国語.txt", "rb"); // in UTF-8
....
fclose($file);
Try CodeIgniter Text helper from this link
Read about convert_accented_characters() function, it can be costumised
My set of tools to use filesystem with UTF-8 on windows OR linux via PHP and compatible with .htaccess check file exists:
function define_cur_os(){
//$cur_os=strtolower(php_uname());
$cur_os=strtolower(PHP_OS);
if(substr($cur_os, 0, 3) === 'win'){
$cur_os='windows';
}
define('CUR_OS',$cur_os);
}
function filesystem_encode($file_name=''){
$file_name=urldecode($file_name);
if(CUR_OS=='windows'){
$file_name=iconv("UTF-8", "ISO-8859-1//TRANSLIT", $file_name);
}
return $file_name;
}
function custom_mkdir($dir_path='', $chmod=0755){
$dir_path=filesystem_encode($dir_path);
if(!is_dir($dir_path)){
if(!mkdir($dir_path, $chmod, true)){
//handle mkdir error
}
}
return $dir_path;
}
function custom_fopen($dir_path='', $file_name='', $mode='w'){
if($dir_path!='' && $file_name!=''){
$dir_path=custom_mkdir($dir_path);
$file_name=filesystem_encode($file_name);
return fopen($dir_path.$file_name, $mode);
}
return false;
}
function custom_file_exists($file_path=''){
$file_path=filesystem_encode($file_path);
return file_exists($file_path);
}
function custom_file_get_contents($file_path=''){
$file_path=filesystem_encode($file_path);
return file_get_contents($file_path);
}
Additional resources
special characters in "file_exists" problem (php)
PHP file_exists with accent returns false
http://www.developpez.net/forums/d825883/php/php-sgbd/php-mysql/mkdir-accents/
http://en.wikipedia.org/wiki/Uname#Table_of_standard_uname_output
I don't need to write much, it works well:
<?php
$dir_name = mb_convert_encoding("Depósito", "ISO-8859-1", "UTF-8");
mkdir($dir_name);
?>

file name with special characters like "é" NOT FOUND

I have a folder on my website just for random files. I used php opendir to list all the files so i can style the page a bit. But the files that I uploaded with special characters in them don't work. when i click on them it says the files are not found. but when i check the directory, the files are there. seems like the links are wrong. any idea how i can get a correct link to these file names with special characters in them?
This is tricky. It depends what encoding your filesystem uses for filenames and how (if) your webserver or PHP functions convert the encoding.
First of all, make sure your links never use unencoded non-ASCII characters. URLs should be in UTF-8, i.e. é should be encoded as %C3%A9. If that doesn't work, try %E9 (é in ISO-8859-1).
You might find iconv() function useful to convert encodings. rawurlencode() is obligatory.
do you see them if you run this?
foreach (new DirectoryIterator('/path/to/folder') as $fileInfo) {
if($fileInfo->isDot() || $fileInfo->isDir()) continue;
echo $fileInfo->getFilename() . "<br>\n";
}
EDIT: just realised i misread the question. Its likely some kind of encoding issue like porneL says

Categories