How can i upload a file with utf-8 name in php? - php

we have a file that have persian name, like:
ایران.jpg
our problem is that php unable to copy or rename this file by orginal name,
meaning if file name does not have fully english character, result is like this:
ط§ط´ط±ع©طھ ظ…ظ„غŒ ظ¾ط®ط´ ظپط±ط¢ظˆط±ط¯ظ‡ ظ‡ط§غŒ ظ†ظپطھغŒ-04ط¢ط¨ط§ظ†.jpg
some articles recommendation for use of iconv function, like:
$fn = iconv("CP-1252", "UTF-8", $file['name']);
we use of that method, but the solution not work.

You need to specify the correct character set to iconv from which to convert the string. Something like this:
$fn = iconv("<persian-character-set>", "UTF-8", $file['name']);
You may want to add additional options to the output character set like TRANSLINT and/or IGNORE:
$fn = iconv("<persian-character-set>", "UTF-8//TRANSLIT//IGNORE", $file['name']);
See http://php.net/manual/en/function.iconv.php for details on these options.

You Should Choose Right Code Page. This Code Works In Windows For Arabic/Persian Names:
$newname = iconv("UTF-8", "CP1256//IGNORE","گچپژ");
echo rename("1.txt", $newname);

In fact, we have a conversion from UTF-8 to UTF-8, Not ridiculous?
Value of $_FILE['name'] is UTF-8 and we try to set that character-set to UTF-8!!!
Our problem is that we have entered into with utf-8, but saved by unknown encoding!
I think Php is a serious bug, if you think opposite, active persian language in your windows os and try to make a folder by persian character by PHP (like ایران), and use of all method that you think work!
If it was, you did a great job, but if ...

Related

PHP read a line from a csv file return wrong in charset

I got a csv file, if I set the charset to ISO-8859-2(eastern europe) in Libre Calc, than it renders the characters correctly, but since the server's locale set to EN-UK.
I can not read the characters correctly, for example:
it returns : T�t insted of Tót.
I tried many things like:
echo (mb_detect_encoding("T�t","ISO-8859-2","UTF-8"));
I know probably the char does not exist in UTF-8 but I tried.
Also tried to setup the correct charset in the header:
header('Content-Type: text/html; charset=iso-8859-2');
echo "T�th";
but its returns : TÄĹźËth insted of Tóth.
Please help me solve this, thanks in advance
I advise against setting the header to charset=iso-8859-2'. It is usual to work with UTF-8. If the data is available with a different encoding, it should be converted to UTF-8 and then processed as CSV. The following example code could be kept as simple as the newline characters in UTF-8 and iso-8859-2 are the same.
$fileName = "yourpath/Iso8859_2.csv";
$fp = fopen($fileName,"r");
while($row = fgets($fp)){
$strUtf8 = mb_convert_encoding($row,'UTF-8','ISO-8859-2');
$arr = str_getcsv($strUtf8);
var_dump($arr);
}
fclose($fp);
The exact encoding of the CSV file must be known. mb_detect_encoding is not suitable for determining the encoding of a file.

Function with special characters

I am creating a site where the authenticated user can write messages for the index site.
On the message create site I have a textbox where the user can give the title of the message, and a textbox where he can write the message.
The message will be exported to a .txt file and from the title I'm creating the title of the .txt file and like this:
Title: This is a message (The filename will be: thisisamessage.txt)
The original given text as filename will be stored in a database rekord among with the .txt filename as path.
For converting the title text I am using a function that looks like this:
function filenameconverter($title){
$filename=str_replace(" ","",$title);
$filename=str_replace("ű","u",$filename);
$filename=str_replace("á","a",$filename);
$filename=str_replace("ú","u",$filename);
$filename=str_replace("ö","o",$filename);
$filename=str_replace("ő","o",$filename);
$filename=str_replace("ó","o",$filename);
$filename=str_replace("é","e",$filename);
$filename=str_replace("ü","u",$filename);
$filename=str_replace("í","i",$filename);
$filename=str_replace("Ű","U",$filename);
$filename=str_replace("Á","A",$filename);
$filename=str_replace("Ú","U",$filename);
$filename=str_replace("Ö","O",$filename);
$filename=str_replace("Ő","O",$filename);
$filename=str_replace("Ó","O",$filename);
$filename=str_replace("É","E",$filename);
$filename=str_replace("Ü","U",$filename);
$filename=str_replace("Í","I",$filename);
return $filename;
}
However it works fine at the most of the time, but sometimes it is not doing its work.
For example: "Pamutkéztörlő adagoló és higiéniai kéztörlő adagoló".
It should stand as a .txt as:
pamutkeztorloadagoloeshigieniaikeztorloadagolo.txt, and most of the times it is.
But sometimes when im giving this it will be:
pamutkă©ztă¶rlĺ‘adagolăłă©shigiă©niaikă©ztă¶rlĺ‘adagolăł.txt
I'm hungarian so the title text will be also hungarian, thats why i have to change the characters.
I'm using XAMPP with apache and phpmyadmin.
I would rather use a generated unique ID for each file as its filename and save the real name in a separate column.
This way you can avoid that someone overwrites files by simply uploading them several times. But if that is what you want you will find several approaches on cleaning filenames here on SO and one very good that I used is http://cubiq.org/the-perfect-php-clean-url-generator
intl
I don't think it is advisable to use str_replace manually for this purpose. You can use the bundled intl extension available as of PHP 5.3.0. Make sure the extension is turned on in your XAMPP settings.
Then, use the transliterator_transliterate() function to transform the string. You can also convert them to lowercase along. Credit goes to simonsimcity.
<?php
$input = 'Pamutkéztörlő adagoló és higiéniai kéztörlő adagoló';
$output = transliterator_transliterate('Any-Latin; Latin-ASCII; lower()', $input);
print(str_replace(' ', '', $output)); //pamutkeztorloadagoloeshigieniaikeztorloadagolo
?>
P.S. Unfortunately, the php manual on this function doesn't elaborate the available transliterator strings, but you can take a look at Artefacto's answer here.
iconv
Using iconv still returns some of the diacritics that are probably not expected.
print(iconv("UTF-8","ASCII//TRANSLIT",$input)); //Pamutk'ezt"orl"o adagol'o 'es higi'eniai k'ezt"orl"o adagol'o
mb_convert_encoding
While, using encoding conversion from Hungarian ISO to ASCII or UTF-8 also gives similar problems you have mentioned.
print(mb_convert_encoding($input, "ASCII", "ISO-8859-16")); //Pamutk??zt??rl?? adagol?? ??s higi??niai k??zt??rl?? adagol??
print(mb_convert_encoding($input, "UTF-8", "ISO-8859-16")); //PamutkéztörlŠadagoló és higiéniai kéztörlŠadagoló
P.S. Similar question could also be found here and here.

mkdir UTF-8 file name

I'm having some problem with mkdir
I'm using xampp on windows, when I try to create a directory, it returns not like should be, in example
mkdir(JPATH_SITE.'/images/projects/'.$region_folder.'/'.$project_folder, 0777, true);
Should return something like
/images/projects/Ленинградская_область/Ленинградская_область_1
But create a directory like:
/images/projects/Ленинградская_область/Ленинградская_область_1
It's something about encoding? or has to do with the OS?
Windows filenames are not encoded in utf8, but in windows-1252 or windows-1251 or smthing like that.
try this:
$dirname = JPATH_SITE.'/images/projects/'.$region_folder.'/'.$project_folder;
//replace "UTF-8" with the respective input charset, if it is not utf8
$dirname = iconv("UTF-8","Windows-1252",$dirname);
mkdir($dirname, 0777, true);
//if this doesnt work, try another charset like this:
$dirname = iconv("UTF-8","Windows-1251",$dirname);
//you can also use iconv on your russian variables only
//remember that you might need to change UTF-8 to another input charset
$region_folder = iconv("UTF-8","Windows-1251",$region_folder);
$project_folder = iconv("UTF-8","Windows-1251",$project_folder);
read more about iconv here: PHP iconv()
also useful to detect your charset encoding: mb_detect_encoding()

PHP doesn't recognize filename with apostrophe in it

Currently I am trying to check with PHP if a file exists. The current file I am trying to check if it exists has an apostrophe in it, the file is called:13067-AP-03 A - Situation projetée.pdf.
The code I use to check if the file exist is:
$filename = 'C:/13067-AP-03 A - Situation projetée.pdf';
if (file_exists($filename))
{
echo "The file exists";
} else
{
echo "The file does not exist";
}
The problem that I am facing right now is that whenever I try to check if the file exists I get the message it doesn't exist. If I continue to remove the é I get the message that the file does exist.
It looks that PHP somehow doesn't recognize the file if it has a apostrophe in it. I tried the following:
urlencode($filename);
addslashes($filename);
utf8_encode($filename);
None of which worked. I also tried:
setlocale(LC_ALL, "en_US.utf8");
Maybe worth noticing is that when I get the filename straight from PHP I get the following:
13067-AP-03 A - Situation projet�e.pdf
I have to do the following to have the filename displayed correctly:
$filename = iconv( "CP437", 'UTF-8', $filename);
I was wondering if someone had the same problem before and could help me out with this one. All help is greatly appreciated.
For those who are interested, the script runs on a windows machine.
Strangely this worked: I copied all the source code from Sublime Text 3 to notepad. I proceeded to save the source code in notepad by overwriting the PHP file.
Now when I check to see if the file exists it shows the following filename that exists:
13067-AP-03 A - Situation projet�e.pdf
The only problem that I am facing right now is that I want to download the file using file_get_contents. But file_get_contents doesnt interpet the � as an apostrophe.
I think it's a problem of the PHP under Windows. I downloaded a Windows binary copy to my Windows who's in Japanese and successfully reproduced your problem.
According to https://bugs.php.net/bug.php?id=47096
So, if you have a generic name of a file (along with its path) as a Unicode string $u (for example UTF-8 encoded) and you want to try to save it with that name under Windows, you must first check the current locale calling setlocale(LC_CTYPE, 0) to retrieve the current code page, then you must convert $u to an array of bytes according to the code page; if one or more code points have no counterpart in the current code page, the file cannot be saved with that name from PHP. Dot.
My code page is CP932, which you can see yours by running chcp in cmd.
So the code is expected to be:
$filename='C:\Users\Frederick\Desktop\13067-AP-03 A - Situation projetée.pdf';
$filename=mb_convert_encoding($filename, 'CP932', 'UTF-8');
var_dump($filename);
var_dump(file_exists($filename));
But this won't work! Why? Because CP932 doesn't contain the character of é!
According to https://msdn.microsoft.com/en-us/library/windows/desktop/dd317748%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396
NTFS stores file names in Unicode. In contrast, the older FAT12, FAT16, and FAT32 file systems use the OEM character set.
Windows itself uses UTF-16LE, which is called Unicode by Microsoft, to save its file names. But PHP doesn't support a UTF-16LE encoded file name.
In conclusion, it's a pity that I cannot find a way to solve the problem rather than escaping all those characters when naming the files if you work on Windows. And I also do not think that the team of PHP will solve the problem in the future.
Make sure that your text editor is saving the file as "UTF-8 without BOM"
BOM is the Byte Order Mark, two bytes placed at the start of the file which allow software reading the file to determine if it has been saved as little-endian or big-endian, however the PHP interpreter cannot interpret these characters and so you must save the file without the byte order mark.
Try this on start of your php file:
<?php
header('Content-Type: text/html; charset=utf-8');
?>

Accents in uploaded file being replaced with '?'

I am building a data import tool for the admin section of a website I am working on. The data is in both French and English, and contains many accented characters. Whenever I attempt to upload a file, parse the data, and store it in my MySQL database, the accents are replaced with '?'.
I have text files containing data (charset is iso-8859-1) which I upload to my server using CodeIgniter's file upload library. I then read the file in PHP.
My code is similar to this:
$this->upload->do_upload()
$data = array('upload_data' => $this->upload->data());
$fileHandle = fopen($data['upload_data']['full_path'], "r");
while (($line = fgets($fileHandle)) !== false) {
echo $line;
}
This produces lines with accents replaced with '?'. Everything else is correct.
If I download my uploaded file from my server over FTP, the charset is still iso-8850-1, but a diff reveals that the file has changed. However, if I open the file in TextEdit, it displays properly.
I attempted to use PHP's stream_encoding method to explicitly set my file stream to iso-8859-1, but my build of PHP does not have the method.
After running out of ideas, I tried wrapping my strings in both utf8_encode and utf8_decode. Neither worked.
If anyone has any suggestions about things I could try, I would be extremely grateful.
It's Important to see if the corruption is happening before or after the query is being issued to mySQL. There are too many possible things happening here to be able to pinpoint it. Are you able to output your MySql to check this?
Assuming that your query IS properly formed (no corruption at the stage the query is being outputted) there are a couple of things that you should check.
What is the character encoding of the database itself? (collation)
What is the Charset of the connection - this may not be set up correctly in your mysql config and can be manually set using the 'SET NAMES' command
In my own application I issue a 'SET NAMES utf8' as my first query after establishing a connection as I am unable to change the MySQL config.
See this.
http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html
Edit: If the issue is not related to mysql I'd check the following
You say the encoding of the file is 'charset is iso-8859-1' - can I ask how you are sure of this?
What happens if you save the file itself as utf8 (Without BOM) and try to reprocess it?
What is the encoding of the php file that is performing the conversion? (What are you using to write your php - it may be 'managing' this for you in an undesired way)
(an aside) Are the files you are processing suitable for processing using fgetcsv instead?
http://php.net/manual/en/function.fgetcsv.php
Files uploaded to your server should be returned the same on download. That means, the encoding of the file (which is just a bunch of binary data) should not be changed. Instead you should take care that you are able to store the binary information of that file unchanged.
To achieve that with your database, create a BLOB field. That's the right column type for it. It's just binary data.
Assuming you're using MySQL, this is the reference: The BLOB and TEXT Types, look out for BLOB.
The problem is that you are using iso-8859-1 instead of utf-8. In order to encode it in the correct charset, you should use the iconv function, like so:
$output_string = iconv('utf-8", "utf-8//TRANSLIT", $input_string);
iso-8859-1 does not have the encoding for any sort of accents.
It would be so much better if everything were utf-8, as it handles virtually every character known to man.

Categories