Currently I am trying to check with PHP if a file exists. The current file I am trying to check if it exists has an apostrophe in it, the file is called:13067-AP-03 A - Situation projetée.pdf.
The code I use to check if the file exist is:
$filename = 'C:/13067-AP-03 A - Situation projetée.pdf';
if (file_exists($filename))
{
echo "The file exists";
} else
{
echo "The file does not exist";
}
The problem that I am facing right now is that whenever I try to check if the file exists I get the message it doesn't exist. If I continue to remove the é I get the message that the file does exist.
It looks that PHP somehow doesn't recognize the file if it has a apostrophe in it. I tried the following:
urlencode($filename);
addslashes($filename);
utf8_encode($filename);
None of which worked. I also tried:
setlocale(LC_ALL, "en_US.utf8");
Maybe worth noticing is that when I get the filename straight from PHP I get the following:
13067-AP-03 A - Situation projet�e.pdf
I have to do the following to have the filename displayed correctly:
$filename = iconv( "CP437", 'UTF-8', $filename);
I was wondering if someone had the same problem before and could help me out with this one. All help is greatly appreciated.
For those who are interested, the script runs on a windows machine.
Strangely this worked: I copied all the source code from Sublime Text 3 to notepad. I proceeded to save the source code in notepad by overwriting the PHP file.
Now when I check to see if the file exists it shows the following filename that exists:
13067-AP-03 A - Situation projet�e.pdf
The only problem that I am facing right now is that I want to download the file using file_get_contents. But file_get_contents doesnt interpet the � as an apostrophe.
I think it's a problem of the PHP under Windows. I downloaded a Windows binary copy to my Windows who's in Japanese and successfully reproduced your problem.
According to https://bugs.php.net/bug.php?id=47096
So, if you have a generic name of a file (along with its path) as a Unicode string $u (for example UTF-8 encoded) and you want to try to save it with that name under Windows, you must first check the current locale calling setlocale(LC_CTYPE, 0) to retrieve the current code page, then you must convert $u to an array of bytes according to the code page; if one or more code points have no counterpart in the current code page, the file cannot be saved with that name from PHP. Dot.
My code page is CP932, which you can see yours by running chcp in cmd.
So the code is expected to be:
$filename='C:\Users\Frederick\Desktop\13067-AP-03 A - Situation projetée.pdf';
$filename=mb_convert_encoding($filename, 'CP932', 'UTF-8');
var_dump($filename);
var_dump(file_exists($filename));
But this won't work! Why? Because CP932 doesn't contain the character of é!
According to https://msdn.microsoft.com/en-us/library/windows/desktop/dd317748%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396
NTFS stores file names in Unicode. In contrast, the older FAT12, FAT16, and FAT32 file systems use the OEM character set.
Windows itself uses UTF-16LE, which is called Unicode by Microsoft, to save its file names. But PHP doesn't support a UTF-16LE encoded file name.
In conclusion, it's a pity that I cannot find a way to solve the problem rather than escaping all those characters when naming the files if you work on Windows. And I also do not think that the team of PHP will solve the problem in the future.
Make sure that your text editor is saving the file as "UTF-8 without BOM"
BOM is the Byte Order Mark, two bytes placed at the start of the file which allow software reading the file to determine if it has been saved as little-endian or big-endian, however the PHP interpreter cannot interpret these characters and so you must save the file without the byte order mark.
Try this on start of your php file:
<?php
header('Content-Type: text/html; charset=utf-8');
?>
Related
So I'm trying to upload an exported Outlook email into my WordPress website through a plugin. Here's the start of the file in question:
http://prntscr.com/jxil42
When I echo the file this is the result I get.
http://prntscr.com/jximu4
Firstly I don't get where the "?" icons come from, then, when I move the uploaded file and open it, this is the result:
http://prntscr.com/jxinh9
I've no idea what's causing this, any help..?
Below one will help you to upload the unicode file.
$filename = iconv("utf-8", "cp936", $filename);
Reference
it may cause of encoding hash, encoding version type as i think
I am creating a site where the authenticated user can write messages for the index site.
On the message create site I have a textbox where the user can give the title of the message, and a textbox where he can write the message.
The message will be exported to a .txt file and from the title I'm creating the title of the .txt file and like this:
Title: This is a message (The filename will be: thisisamessage.txt)
The original given text as filename will be stored in a database rekord among with the .txt filename as path.
For converting the title text I am using a function that looks like this:
function filenameconverter($title){
$filename=str_replace(" ","",$title);
$filename=str_replace("ű","u",$filename);
$filename=str_replace("á","a",$filename);
$filename=str_replace("ú","u",$filename);
$filename=str_replace("ö","o",$filename);
$filename=str_replace("ő","o",$filename);
$filename=str_replace("ó","o",$filename);
$filename=str_replace("é","e",$filename);
$filename=str_replace("ü","u",$filename);
$filename=str_replace("í","i",$filename);
$filename=str_replace("Ű","U",$filename);
$filename=str_replace("Á","A",$filename);
$filename=str_replace("Ú","U",$filename);
$filename=str_replace("Ö","O",$filename);
$filename=str_replace("Ő","O",$filename);
$filename=str_replace("Ó","O",$filename);
$filename=str_replace("É","E",$filename);
$filename=str_replace("Ü","U",$filename);
$filename=str_replace("Í","I",$filename);
return $filename;
}
However it works fine at the most of the time, but sometimes it is not doing its work.
For example: "Pamutkéztörlő adagoló és higiéniai kéztörlő adagoló".
It should stand as a .txt as:
pamutkeztorloadagoloeshigieniaikeztorloadagolo.txt, and most of the times it is.
But sometimes when im giving this it will be:
pamutkă©ztă¶rlĺ‘adagolăłă©shigiă©niaikă©ztă¶rlĺ‘adagolăł.txt
I'm hungarian so the title text will be also hungarian, thats why i have to change the characters.
I'm using XAMPP with apache and phpmyadmin.
I would rather use a generated unique ID for each file as its filename and save the real name in a separate column.
This way you can avoid that someone overwrites files by simply uploading them several times. But if that is what you want you will find several approaches on cleaning filenames here on SO and one very good that I used is http://cubiq.org/the-perfect-php-clean-url-generator
intl
I don't think it is advisable to use str_replace manually for this purpose. You can use the bundled intl extension available as of PHP 5.3.0. Make sure the extension is turned on in your XAMPP settings.
Then, use the transliterator_transliterate() function to transform the string. You can also convert them to lowercase along. Credit goes to simonsimcity.
<?php
$input = 'Pamutkéztörlő adagoló és higiéniai kéztörlő adagoló';
$output = transliterator_transliterate('Any-Latin; Latin-ASCII; lower()', $input);
print(str_replace(' ', '', $output)); //pamutkeztorloadagoloeshigieniaikeztorloadagolo
?>
P.S. Unfortunately, the php manual on this function doesn't elaborate the available transliterator strings, but you can take a look at Artefacto's answer here.
iconv
Using iconv still returns some of the diacritics that are probably not expected.
print(iconv("UTF-8","ASCII//TRANSLIT",$input)); //Pamutk'ezt"orl"o adagol'o 'es higi'eniai k'ezt"orl"o adagol'o
mb_convert_encoding
While, using encoding conversion from Hungarian ISO to ASCII or UTF-8 also gives similar problems you have mentioned.
print(mb_convert_encoding($input, "ASCII", "ISO-8859-16")); //Pamutk??zt??rl?? adagol?? ??s higi??niai k??zt??rl?? adagol??
print(mb_convert_encoding($input, "UTF-8", "ISO-8859-16")); //PamutkéztörlŠadagoló és higiéniai kéztörlŠadagoló
P.S. Similar question could also be found here and here.
I want to save a file to Windows using Japanese characters in the filename.
The PHP file is saved with UTF-8 encoding
<?php
$oldfile = "test.txt";
$newfile = "日本語.txt";
copy($oldfile,$newfile);
?>
The file copies, but appears in Windows as
日本語.txt
How do I make it save as
日本語.txt
?
I have ended up using the php-wfio extension from https://github.com/kenjiuno/php-wfio
After putting php_wfio.dll into php\ext folder and enabling the extension, I prefixed the filenames with wfio:// (both need to be prefixed or you get a Cannot rename a file across wrapper types error)
My test code ends up looking like
<?php
$oldfile = "wfio://test.txt";
$newfile = "wfio://日本語.txt";
copy($oldfile,$newfile);
?>
and the file gets saved in Windows as 日本語.txt which is what I was looking for
Starting with PHP 7.1, i would link you to this answer https://stackoverflow.com/a/38466772/3358424 . Unfortunately, the most of the recommendations are not valid, that are listed in the answer that strives to be the only correct one. Like "just urlencode the filename" or "FS expects iso-8859-1", etc. are terribly wrong assumptions that misinform people. That can work by luck but are only valid for US or almost western codepages, but are otherwise just wrong. PHP 7.1 + default_charset=UTF-8 is what you want. With earlier PHP versions, wfio or wrappers to ext/com_dotnet might be indeed helpful.
Thanks.
First of all, my code is working...but the resultant file is causing problems on my server. Only files with strange characters are causing errors on the server, such as file does not exist or error connecting to file when trying to open the file through FTP. All files without strange characters are working fine on the server, and can be opened and edited.
Here's my workflow:
Get text from a TextView on user's screen, run it through this code to remove unwanted characters:
replaceAll("[^a-z ,()A-Z0-9]+", "-");
Create a text file using this text as the file name;
Upload this text file to server with this PHP script:
<?php
$file_path = "uploads/";
$file_path = $file_path . basename( $_FILES['uploaded_file']['name']);
if(move_uploaded_file($_FILES['uploaded_file']['tmp_name'], $file_path)) {
echo "success";
} else{
echo "fail";
}
?>
The filenames are containing these strange characters, I assume due to non English characters on the user's screen.
I need to be careful because the path to upload the file to my server is based on this file name and I don't know how to test it with non English characters. Any help is much appreciated. I need to remove/replace any non English characters without messing up the file path.
Technically you can solve this by converting string on server to UTF-8 using mb_convert_encoding, but really your code is very not safe, as you are using a passed user variable as a file path, and hackers can send /../../../ and so forth.
The solution I use for both, is to convert on server the passed file name to a hex string, using bin2hex. That way you have a very safe file name, with no encoding issues.
Use this line its help you.
String styledText = Your File Name;
textView.setText(Html.fromHtml(styledText));
This question already has answers here:
PHP - Upload utf-8 filename
(9 answers)
UTF-8 all the way through
(13 answers)
Closed 4 months ago.
I have this home made app that allows multiple file uploads, I pass the files to php with AJAX, create new dir with php, move there uploaded files and save the dir location to database. Then to see the files I run listing of the directory location saved in the db.
The problem is that files come from all around the world so very often they have some non latin characters like for example ü. When I echo the filename in php names appear correctly even when they have names written in Arabic, yet they are being saved on the server with encoded names as for example ü in place of ü. When I list the files from directory I can see the name ü.txt insted of ü.txt but when I click on it server returns error object not found (since on the server it is saved as ü.txt and it reads the link as ü.txt).
I tried some of the suggested solutions as for example using iconv, but the filenames are still being saved the same way.
I could swear the problem wasn't present when the web app was hosted on linux, but at the moment I am not so sure about it anymore. Right now I temporarily run it on xampp (on Windows) and it seems like filenames are saved using windows-1252 encoding (default Windows' encoding on the server). Is it default Windows encoding related problem?
To be honest I do not know how to approach that problem and I would appreciate any help. Should I keep on trying to save the files in different character encoding or would it be better to approach it different way and change the manner of listing the already saved and encoded files?
EDIT. According to the (finally) closed bug report it was fixed in php 7.1.
In the end I solved it with the following approach:
When uploading the files I urlencode the names with rawurlencode()
When fetching the files from server they are obviously URL encoded so I use urldecode($filename) to print correct names
Links in a href are automatically translated, so for example "%20" becomes a " " and URL ends up being incorrect since it links to incorrect filename. I decided to encode them back and print them ending up with something like this: print $dirReceived.rawurlencode($file); ($dirReceived is the directory where received files are stored, defined earlier in the code)
I also added download attribute with urldecode($filename) to save the file with UTF-8 name when needed.
Thanks to this I have files saved on the server with url encoded names. Can open them in browser (very important as most of them are *.pdf) and can download them with correct name which lets me upload and download even files with names written in Arabic, Cyrillic, etc.
So far I tested it and looks good. I am thinking of implementing it in production code. Any concerns/thoughts on it?
EDIT.
Since there are no objections I select my answer as the one that solved my problem. After doing some testing everything looks good on client and server side. When saving the files on server they are URL encoded, when downloading them they are decoded and saved with correct names.
At the beginning I was using the code:
for($i=0;$i<count($_FILES['file']['name']);$i++)
{
move_uploaded_file($_FILES['file']['tmp_name'][$i],
"../filepath/" . $_FILES['file']['name'][$i]);
}
This method caused the problem upon saving file and replaced every UTF-8 special character with cp1252 encoded one (ü saved as ü etc.), so I added one line and replaced that code with the following:
for($i=0;$i<count($_FILES['file']['name']);$i++)
{
$fname= rawurlencode($_FILES['file']['name'][$i]);
move_uploaded_file($_FILES['file']['tmp_name'][$i],
"../filepath/" . $fname);
}
This allows me to save any filename on server using URL encoding (% and two hexadecimals) which is compatible with both cp1252 and UTF-8.
To list the saved files I use filepaths I have saved in DB and list them for files. I was using the following code:
if (is_dir($dir)){
if ($dh = opendir($dir)){
while (($file = readdir($dh)) !== false){
if(is_file($dir . $file)){
echo "<li><a href='".$dir.$file."' download='".$file ."'>".$file."</a></li><br />";
}
}
closedir($dh);
}
}
Since URL encoded filenames were decoded automatically I changed it to:
if (is_dir($dir)){
if ($dh = opendir($dir)){
while (($file = readdir($dh)) !== false){
if(is_file($dir . $file)){
echo "<li><a href='";
print $dir.rawurlencode($file);
echo "' download='" . urldecode($file) ."'>".urldecode($file)."</a></li><br />";
}
}
closedir($dh);
}
}
I don't know if this is the best way to solve it but works perfectly, also I am aware that it is generally a good practice not to use php to generate html tags but at the moment I have some critical bugs that need addressing so first that and then I'll have to work on the appearance of the code itself.
EDIT2
Also the great thing is I do not have to change names of the already uploaded files which in my case is a big advantage.
Are you using $_FILES['upfile']['name'] to name the file? That could create your problem.
How about using GNU Recode?
$fileName = recode_string('latin1',$_FILES['upfile']['name']);
Syntax:
recode_string(string recode type,string $string)
Valid Character sets: http://www.faqs.org/rfcs/rfc1345.html
Somehow you must validate the characters in the uploaded file name.
You could also try sprintf. The formatted string characters can be unpredictable, but will probably work.
$fileName = pathinfo($_FILES['upfile']['name'], PATHINFO_FILENAME);
$fileName = sprintf('./uploads/%s',$fileName);
When you save the file name use
$fileName = mysqli_real_escape_string($fileName)