Responsive file manager v9 uploading arabic file's name issue - php

I am using now Responsive file manager v9 as a plugin of tinymce, the version of tinymce is 4.7.4, PHP version is 5.5. The problem I was trying fix the uploaded arabic files' name issue, RFM doesn't upload files which their names is arabian with correct names.
The names of images I choose to test are "vvv" , "اختبار", "اختبار - Copy" all of them are 'jpg' after I upload the files those has an arabic names they give the result like this:
اختبار.jpg ===> ط§ط®طھط¨ط§ط±.jpg
اختبار - Copy.jpg ==> ط§ط®طھط¨ط§ط± - Copy.jpg
however, in config.php is the mb_internal_encoding function is UTF-8.
I tried use iconv by convert between utf-8 to cp1256 in UploadHandler.php line 1097 like this:
move_uploaded_file($uploaded_file, iconv("utf-8", "cp1256",$file_path));
instead of
move_uploaded_file($uploaded_file, $file_path);
and it allowed to upload the files with their arabian names but they appeared in RFM browser with ?????? and ????? - Copy and no thumbs images in browser, however the thumb folder had the images and the image اختبار.jpg didn't upload correctly and made it bad. only English files' names work fine.
I worked in all php files and I used base64_encode, and I tried change the the encoding in config.php but nothing work.
Does anyone have any idea to fix that ?

The reason why you're getting "?????? and ?????" is because you have to change the collection set of your database as well which could be UTF8 General CI and than save the file name (without iconv()) and move the file with file_name by using iconv()

You don't want to mess with UploadHandler.php. All of the preprocessing of the upload happens in upload.php, including massaging the filename in the function fix_filename in utils.php. By the time it gets to UploadHandler, the filename has already been modified so iconv and friends won't work. Take a look at fix_filename and try manipulating the string there:
/**
* Cleanup filename
*
* #param string $str
* #param bool $transliteration
* #param bool $convert_spaces
* #param string $replace_with
* #param bool $is_folder
*
* #return string
*/
function fix_filename($str, $config, $is_folder = false)
{
if ($config['convert_spaces'])
{
$str = str_replace(' ', $config['replace_with'], $str);
}
if ($config['transliteration'])
{
if (!mb_detect_encoding($str, 'UTF-8', true))
{
$str = utf8_encode($str);
}
if (function_exists('transliterator_transliterate'))
{
$str = transliterator_transliterate('Any-Latin; Latin-ASCII', $str);
}
else
{
$str = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $str);
}
$str = preg_replace("/[^a-zA-Z0-9\.\[\]_| -]/", '', $str);
}
$str = str_replace(array( '"', "'", "/", "\\" ), "", $str);
$str = strip_tags($str);
// Empty or incorrectly transliterated filename.
// Here is a point: a good file UNKNOWN_LANGUAGE.jpg could become .jpg in previous code.
// So we add that default 'file' name to fix that issue.
if (strpos($str, '.') === 0 && $is_folder === false)
{
$str = 'file' . $str;
}
return trim($str);
}

Related

PHP glob function with directories using special characters

Having directory named "Łęć"
and using glob like this:
$dirs = glob( FILES . '/general/*' );
Gives me the result of:
...
(string) "../pliki/general/Logo"
(string) "../pliki/general/���"
(string) "../pliki/general/Maski"
...
And this ��� is the directory named Łęć
I totally can't figure it out how to make it work, so I can have folders with special characters and the glob() to work with it properly
$dirs = glob( FILES . '/general/q/*' );
foreach($dirs as &$dir)
{
$dir = bin2hex($dir);
}
dd($dirs);
This code above globs where Łęć folder is and bin2hex it's name returns: 2e2e2f706c696b692f67656e6572616c2f712fa3eae6 and the folder name alone without the path is a3eae6
a3eae6 is the hexadecimal representation of the string of unknown encoding returned for "Łęć". The string returned by glob() can write in PHP-Notation as "\xa3\xea\xe6". The conversion of this character string with an encoding unknown to us into UTF-8 must then result "Łęć".
Through trial and error, I found that the "ISO-8859-2" encoding satisfies this condition:
$strCode = "\xa3\xea\xe6";
$name = mb_convert_encoding($strCode,"UTF-8","ISO-8859-2");
var_dump($name === "Łęć"); //bool(true)
The strings that glob returns must all be converted with mb_convert_encoding:
$fullNameUTF8 = mb_convert_encoding($strFromGlob,"UTF-8","ISO-8859-2");
This procedure is not certain. It's better to know the exact encoding used by the file system you are accessing.

How to change filename's encode type?

I am going to change my filename's encode type from utf-8 to big5, and this is what I have so far:
$path = "stu_resume/104206002_87";
$result =iconv("utf-8", "big5", $path);
echo $result;
echo mb_detect_encoding($result);
Within the folder of 104206002_87, there are 2 files, which are 104206002_87_履歷, 104206002_87_自傳. After the code above is executed, I found that there is nothing changed in the folder. Does anyone know how to solve the problem? Thanks a lot.
iconv() doesn't modify files. It just converts a string. In this case, the string it's converting is ""stu_resume/104206002_87" -- since this string only contains ASCII characters, nothing changes when it's converted from UTF-8 to Big5.
If you want to rename the files in the directory with that name, you will need to do so explicitly, e.g.
$iter = new DirectoryIterator("stu_resume/104206002_87");
foreach ($iter as $file) {
if (!$file->isDot()) {
$old_name = $file->getPathname();
$new_name = iconv("utf-8", "big5", $old_name);
rename($old_name, $new_name);
}
}

Character encoding issue when importing CSV from Excel?

I have a PHP script which exports a CSV file. My users then edit the file in Excel, save it, and re-upload it.
If they type a euro symbol into a field, when the file is uploaded, the euro symbol, and everything afterwards is missing. I'm using the str_getcsv function.
If I try to convert the encoding (say to UTF-8), the euro symbol disappears, and I get a missing character marker (usually represented by a blank square or a question mark in a diamond).
How to I convert the encoding to UTF-8, but also keep the euro symbol (and other non-standard characters)?
Edit:
Here is my code:
/**
* Decodes html entity encoded characters back to their original
*
* #access public
* #param String The element of the array to process
* #param Mixed The key of the current element of the array
* #return void
*/
public function decodeArray(&$indexValue, $key)
{
$indexValue = html_entity_decode($indexValue, ENT_NOQUOTES, 'Windows-1252');
}
/**
* Parses the contents of a CSV file into a two dimensional array
*
* #access public
* #param String The contents of the uploaded CSV file
* #return Array Two dimensional-array.
*/
public function parseCsv($contents)
{
$changes = array();
$lines = split("[\n|\r]", $contents);
foreach ($lines as $line) {
$line = utf8_encode($line);
$line = htmlentities($line, ENT_NOQUOTES);
$lineValues = str_getcsv($line);
array_walk($lineValues, 'decodeArray');
$changes[] = $lineValues;
}
return $changes;
I have also tried the following instead of the utf8_encode function:
iconv("Windows-1252", "UTF-8//TRANSLIT", $line);
And also just:
$line = htmlentities($line, ENT_NOQUOTES, 'Windows-1252');
With the utf8_encode function, the offending character is removed from the string. With any other method, the character and everything after the character is missing.
Example:
The field value : "Promo € Mobile"
is interpreted as : "Promo Mobile"
Add these to the beginning of your CSV file
chr(239) . chr(187) . chr(191)

how to set encoding after using move_uploaded_file?

i am saving a file using move_uploaded_file($file['tmp_name'], $save_path . $FileName); but when the file name i choose is in arabic , the file is saved in strange characters like that : ÒíäÈ.pdf.
so when i try to open the uploaded file later, it says file not found .(the real one) what should i do ??
you can use transliteration on any string you want to convert to latin-similar characters. My very-custom transliteration code looks like that:
// convert a utf8-encoded string into latin representative
function transliterate_string($params=array())
{
// PARAMS: "string", "language"
// 0) fill-in "native" chars by language and their "latin" representative
$SP_trans = array(
"ae"=>array("native"=>"ة,بْ,...","latin"=>"a,b,..."),
...other langs if you want
);
// 1) break "native" & "latin" strings
$nc = explode(",",$SP_trans[ $params["language"] ]["native"]);
$lc = explode(",",$SP_trans[ $params["language"] ]["latin"]);
// 2) convert to lower first
$string = mb_strtolower($params["string"],"utf-8");
// 3) loop each character
mb_internal_encoding("UTF-8");
for($x=0,$sz=mb_strlen($string);$x<$sz;$x++) {
$char = mb_substr($string,$x,1);
$index = array_search($char,$nc);
$out[$x] = ($index===FALSE ? $char : $lc[$index]);
}
return trim(implode("",$out));
}
the function just scans a string and convert each character of specific character
set into latin in a custom way. Then you can safely save the file as latin.
It will be better to rename the image name using time stamp
$imgname = time().'.'.'jpg';
$imgtmpname=$_FILES['file']['tmp_name'];
$fullpath= $path.$imgname;
$filename = $imgname;
move_uploaded_file($imgtmpname,$fullpath);
and also store the $imgname in the database so we can fetch the image by this name..It will also avoid the name conflict between images as the timestamps keep changing.

php file_put_contents asian character filename encoding

I'm trying to get this scrape images off of wikipedia. What good is free licensed media if you can't get it? Original script is here.
If you put this
http://upload.wikimedia.org/wikipedia/commons/2/26/%E7%9A%84-bw.png
in firefox, it will immediately be transformed into
http://upload.wikimedia.org/wikipedia/commons/2/26/的-bw.png
so that when you save the image, it's saved as 的-bw.png
Simple enough eh? Now how to get php to do that? Just guessing, I tried utf8_decode($fileName) .. but getting the wrong Chinese characters.
$src= "http://upload.wikimedia.org/wikipedia/commons/2/26/%E7%9A%84-bw.png";
$pngData = file_get_contents($src);
$fileName = basename($src);
file_put_contents($fileName, $pngData);
Any help appreciated, as I really have no idea where to go from here.
Have you tried url_decode(); ?
<?php
$url = 'http://upload.wikimedia.org/wikipedia/commons/2/26/%E7%9A%84-bw.png';
$parts = explode('/', $url);
$title = $parts[count($parts)-1]; //get last section
$title = urldecode($title);
?>
Squirrelmail contains a nice function in the sources to convert unicode to entities:
<?php
function charset_decode_utf_8 ($string) {
/* Only do the slow convert if there are 8-bit characters */
/* avoid using 0xA0 (\240) in ereg ranges. RH73 does not like that */
if (! ereg("[\200-\237]", $string) and ! ereg("[\241-\377]", $string))
return $string;
// decode three byte unicode characters
$string = preg_replace("/([\340-\357])([\200-\277])([\200-\277])/e",
"'&#'.((ord('\\1')-224)*4096 + (ord('\\2')-128)*64 + (ord('\\3')-128)).';'",
$string);
// decode two byte unicode characters
$string = preg_replace("/([\300-\337])([\200-\277])/e",
"'&#'.((ord('\\1')-192)*64+(ord('\\2')-128)).';'",
$string);
return $string;
}
?>

Categories