Uploaded file question mark icons then turn into chinese characters - php

So I'm trying to upload an exported Outlook email into my WordPress website through a plugin. Here's the start of the file in question:
http://prntscr.com/jxil42
When I echo the file this is the result I get.
http://prntscr.com/jximu4
Firstly I don't get where the "?" icons come from, then, when I move the uploaded file and open it, this is the result:
http://prntscr.com/jxinh9
I've no idea what's causing this, any help..?

Below one will help you to upload the unicode file.
$filename = iconv("utf-8", "cp936", $filename);
Reference

it may cause of encoding hash, encoding version type as i think

Related

How should I set the download name of a pdf with fpdf?

I am trying to set a name for a pdf file I generated with FPDF. However for some reason the browser changes some characters.
I am sending this:
$pdfTitle = 'Overview: 2017/2018'
$pdf->Output( 'D', $pdfTitle, true );
Yet when I save my pdf it changes some characters and I and the download name becomes: 'Overview_ 2017_2018'.
I am using UTF-8 encoding on my php file.
FPDF-documentation: http://fpdf.org/en/doc/output.htm
I have two questions:
How can I make sure the download name is the same as the one I set in my php file?
What is the underlying issue that changes the name?
PS: In the real project the string will come from a database, so I can only access the string programatically and not make direct changes to it.
You are using the special characters : and / in your filename in your code. Because of this fpdf is filtering your outputs filename.
For example:
Overview: 2017/2018
^ ^ are not supported as filename in Windows & some other OS.
Tip:
You may add .pdf in your name if file is not saving as pdf file.

Strange non English characters in String, error on server

First of all, my code is working...but the resultant file is causing problems on my server. Only files with strange characters are causing errors on the server, such as file does not exist or error connecting to file when trying to open the file through FTP. All files without strange characters are working fine on the server, and can be opened and edited.
Here's my workflow:
Get text from a TextView on user's screen, run it through this code to remove unwanted characters:
replaceAll("[^a-z ,()A-Z0-9]+", "-");
Create a text file using this text as the file name;
Upload this text file to server with this PHP script:
<?php
$file_path = "uploads/";
$file_path = $file_path . basename( $_FILES['uploaded_file']['name']);
if(move_uploaded_file($_FILES['uploaded_file']['tmp_name'], $file_path)) {
echo "success";
} else{
echo "fail";
}
?>
The filenames are containing these strange characters, I assume due to non English characters on the user's screen.
I need to be careful because the path to upload the file to my server is based on this file name and I don't know how to test it with non English characters. Any help is much appreciated. I need to remove/replace any non English characters without messing up the file path.
Technically you can solve this by converting string on server to UTF-8 using mb_convert_encoding, but really your code is very not safe, as you are using a passed user variable as a file path, and hackers can send /../../../ and so forth.
The solution I use for both, is to convert on server the passed file name to a hex string, using bin2hex. That way you have a very safe file name, with no encoding issues.
Use this line its help you.
String styledText = Your File Name;
textView.setText(Html.fromHtml(styledText));

Special characters encoding in image filenames after server migration

I've migrated a WordPress website from a Hostgator shared host to a Ubuntu Digital Ocean LAMP stack.
The trouble started when I exported the image files which had special characters, for example the file
operários_tarsila-1024x640.jpg.
When WordPress tries to reach the file, it displays an error. I've found the cause:
I can see via Inspect Element that Wordpress tries to call: http://mywebsite.com/wp-content/uploads/2013/02/oper%C3%A1rios_tarsila-1024x640.jpg and the server returns a 404 error.
However if I type this URL in the browser: http://mywebsite.com/wp-content/uploads/2013/02/opera%CC%81rios_tarsila-1024x640.jpg it works and the image is displayed.
So, it seems like this difference between the á encoding from %C3%A1 (á character) to a+%CC%81 (combining accute accent) is what is causing WordPress to not display my images.
So now I have in my server thousands of accented image filenames with the structure character+ combining accent and WordPress calling the image filenames with the structure accented character.
Is there a way bash rename all of them with a comparisson table? Or a way to make Apache aware of those differences and point to the right file when this kind of confusion happen?
Apparently the problem is how the backup is decompressed on the new server.
There are 2 ways to fix this:
Rename the files manually by names without accents and then modify the database and change the file names in the database (This maluco and can be dangerous, it would be best to back up the database).
Upload files using Filezilla, but setting it to force the charset encoding in UTF-8.
File> Site Manager> {YOUR SITE}> Tab Charset> Force UTF-8
We have same problem - Mac + FileZilla + special characters in SK language.
Problem fixed using another FTP client (Cyberduck in our case ).
It seems to be a problem with FileZilla filenames encofing. Force utf8 encoding (FileZilla host settings) doesn't help.
So, just to touch upon this issue and a solution that worked for me... I also migrated a Wordpress site and found that all images with special characters in their filename produced a 404 after migration.
I ended up having to do the manual file renaming and edits to the database via phpMyAdmin. It was arduous and I definitely recommend backing up your database first.
In my case, I had a ton of media attachments that used the special character © in their filename.
First, I locally renamed the files by removing the character. I used 1-4a rename. Just found the filename and replaced it with nothing (not even a space). Then, I removed all the old files from the /wp-content/uploads/ folder and replaced them with the new files.
Next, I went into my database to update the table values. Media attachments have info stored in both the wp_posts and wp_postmeta tables. Below is the SQL I ran to update both -
update wp_posts set guid = replace(guid,'©','');
UPDATE wp_postmeta SET meta_value = REPLACE(meta_value, '©', '')
WHERE LOWER(RIGHT(meta_value, 5)) = '.jpeg' OR
LOWER(RIGHT(meta_value, 4)) IN ('.jpg', '.gif', '.png')
Which, again, we are replacing the character with nothing, not even a space.
I had to use the WP plugin Regenerate Thumbnails in order to have all of thumbnails + various attachment sizes update, but that did the trick.
I really appreciate everyone's efforts on this post and this post to help me figure it out! Hope this helps someone!
Have you tried setting the same encoding in PHP script, Mysql and HTML ?
PHP : http://php.net/manual/en/function.mb-internal-encoding.php
Mysql : http://php.net/manual/en/function.mysql-set-charset.php
HTML : <meta http-equiv="content-type" content="text/html; charset=utf-8" />
This problem is looking like a charset accordance problem between all these languages.
If this is not working, you will have to use a small script to rename all your pictures, using a function like :
function wd_remove_accents($str, $charset='utf-8')
{
$str = htmlentities($str, ENT_NOQUOTES, $charset);
$str = preg_replace('#&([A-za-z])(?:acute|cedil|caron|circ|grave|orn|ring|slash|th|tilde|uml);#', '\1', $str);
$str = preg_replace('#&([A-za-z]{2})(?:lig);#', '\1', $str); // pour les ligatures e.g. 'œ'
$str = preg_replace('#&[^;]+;#', '', $str); // supprime les autres caractères
return $str;
}
Source : http://www.weirdog.com/blog/php/supprimer-les-accents-des-caracteres-accentues.html
We have just had a similar problem with french caracters in our wordpress deployment, and our solution was to upload the files with FileZilla from a PC instead of FileZilla from a Mac.
When I would upload from mac OSX to the CentOS server, the files will only show if called in the a+%CC%81 format.
When I uploaded the files from the PC, apache found the files in the %C3%A1 format, which was how wordpress had them encoded.
If you have WP_CLI run this BashScript. You must change the wp_ table prefix.
It only modifies the file-names that are NOT on FORM_D format.
Backup your DB just in case something goes wrong.
#!/bin/bash
normalizeWP_PHP_Script=$'
global $wpdb;
$rows = $wpdb->get_results( "SELECT * FROM wp_postmeta where meta_key='"'"'_wp_attached_file'"'"'");
foreach ( $rows as $row )
{
$postId = $row->{'"'"'post_id'"'"'};
$filePath = $row->{'"'"'meta_value'"'"'};
if( ! normalizer_is_normalized($filePath, Normalizer::FORM_D) ){
$filename_nfd = Normalizer::normalize($filePath, Normalizer::FORM_D);
echo $filename_nfd." | ";
$wpdb->query($wpdb->prepare("UPDATE wp_postmeta SET meta_value='"'"'$filename_nfd'"'"' WHERE post_id=$postId"));
}
}';
wp eval "$normalizeWP_PHP_Script"
echo " - Uploads-url nomalized --nfd"
There's a plugin for this situation.
You can check on Media File Renamer

PHP doesn't recognize filename with apostrophe in it

Currently I am trying to check with PHP if a file exists. The current file I am trying to check if it exists has an apostrophe in it, the file is called:13067-AP-03 A - Situation projetée.pdf.
The code I use to check if the file exist is:
$filename = 'C:/13067-AP-03 A - Situation projetée.pdf';
if (file_exists($filename))
{
echo "The file exists";
} else
{
echo "The file does not exist";
}
The problem that I am facing right now is that whenever I try to check if the file exists I get the message it doesn't exist. If I continue to remove the é I get the message that the file does exist.
It looks that PHP somehow doesn't recognize the file if it has a apostrophe in it. I tried the following:
urlencode($filename);
addslashes($filename);
utf8_encode($filename);
None of which worked. I also tried:
setlocale(LC_ALL, "en_US.utf8");
Maybe worth noticing is that when I get the filename straight from PHP I get the following:
13067-AP-03 A - Situation projet�e.pdf
I have to do the following to have the filename displayed correctly:
$filename = iconv( "CP437", 'UTF-8', $filename);
I was wondering if someone had the same problem before and could help me out with this one. All help is greatly appreciated.
For those who are interested, the script runs on a windows machine.
Strangely this worked: I copied all the source code from Sublime Text 3 to notepad. I proceeded to save the source code in notepad by overwriting the PHP file.
Now when I check to see if the file exists it shows the following filename that exists:
13067-AP-03 A - Situation projet�e.pdf
The only problem that I am facing right now is that I want to download the file using file_get_contents. But file_get_contents doesnt interpet the � as an apostrophe.
I think it's a problem of the PHP under Windows. I downloaded a Windows binary copy to my Windows who's in Japanese and successfully reproduced your problem.
According to https://bugs.php.net/bug.php?id=47096
So, if you have a generic name of a file (along with its path) as a Unicode string $u (for example UTF-8 encoded) and you want to try to save it with that name under Windows, you must first check the current locale calling setlocale(LC_CTYPE, 0) to retrieve the current code page, then you must convert $u to an array of bytes according to the code page; if one or more code points have no counterpart in the current code page, the file cannot be saved with that name from PHP. Dot.
My code page is CP932, which you can see yours by running chcp in cmd.
So the code is expected to be:
$filename='C:\Users\Frederick\Desktop\13067-AP-03 A - Situation projetée.pdf';
$filename=mb_convert_encoding($filename, 'CP932', 'UTF-8');
var_dump($filename);
var_dump(file_exists($filename));
But this won't work! Why? Because CP932 doesn't contain the character of é!
According to https://msdn.microsoft.com/en-us/library/windows/desktop/dd317748%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396
NTFS stores file names in Unicode. In contrast, the older FAT12, FAT16, and FAT32 file systems use the OEM character set.
Windows itself uses UTF-16LE, which is called Unicode by Microsoft, to save its file names. But PHP doesn't support a UTF-16LE encoded file name.
In conclusion, it's a pity that I cannot find a way to solve the problem rather than escaping all those characters when naming the files if you work on Windows. And I also do not think that the team of PHP will solve the problem in the future.
Make sure that your text editor is saving the file as "UTF-8 without BOM"
BOM is the Byte Order Mark, two bytes placed at the start of the file which allow software reading the file to determine if it has been saved as little-endian or big-endian, however the PHP interpreter cannot interpret these characters and so you must save the file without the byte order mark.
Try this on start of your php file:
<?php
header('Content-Type: text/html; charset=utf-8');
?>

PHP - Windows - filename incorrect after upload (ü saved as ü etc.) [duplicate]

This question already has answers here:
PHP - Upload utf-8 filename
(9 answers)
UTF-8 all the way through
(13 answers)
Closed 4 months ago.
I have this home made app that allows multiple file uploads, I pass the files to php with AJAX, create new dir with php, move there uploaded files and save the dir location to database. Then to see the files I run listing of the directory location saved in the db.
The problem is that files come from all around the world so very often they have some non latin characters like for example ü. When I echo the filename in php names appear correctly even when they have names written in Arabic, yet they are being saved on the server with encoded names as for example ü in place of ü. When I list the files from directory I can see the name ü.txt insted of ü.txt but when I click on it server returns error object not found (since on the server it is saved as ü.txt and it reads the link as ü.txt).
I tried some of the suggested solutions as for example using iconv, but the filenames are still being saved the same way.
I could swear the problem wasn't present when the web app was hosted on linux, but at the moment I am not so sure about it anymore. Right now I temporarily run it on xampp (on Windows) and it seems like filenames are saved using windows-1252 encoding (default Windows' encoding on the server). Is it default Windows encoding related problem?
To be honest I do not know how to approach that problem and I would appreciate any help. Should I keep on trying to save the files in different character encoding or would it be better to approach it different way and change the manner of listing the already saved and encoded files?
EDIT. According to the (finally) closed bug report it was fixed in php 7.1.
In the end I solved it with the following approach:
When uploading the files I urlencode the names with rawurlencode()
When fetching the files from server they are obviously URL encoded so I use urldecode($filename) to print correct names
Links in a href are automatically translated, so for example "%20" becomes a " " and URL ends up being incorrect since it links to incorrect filename. I decided to encode them back and print them ending up with something like this: print $dirReceived.rawurlencode($file); ($dirReceived is the directory where received files are stored, defined earlier in the code)
I also added download attribute with urldecode($filename) to save the file with UTF-8 name when needed.
Thanks to this I have files saved on the server with url encoded names. Can open them in browser (very important as most of them are *.pdf) and can download them with correct name which lets me upload and download even files with names written in Arabic, Cyrillic, etc.
So far I tested it and looks good. I am thinking of implementing it in production code. Any concerns/thoughts on it?
EDIT.
Since there are no objections I select my answer as the one that solved my problem. After doing some testing everything looks good on client and server side. When saving the files on server they are URL encoded, when downloading them they are decoded and saved with correct names.
At the beginning I was using the code:
for($i=0;$i<count($_FILES['file']['name']);$i++)
{
move_uploaded_file($_FILES['file']['tmp_name'][$i],
"../filepath/" . $_FILES['file']['name'][$i]);
}
This method caused the problem upon saving file and replaced every UTF-8 special character with cp1252 encoded one (ü saved as ü etc.), so I added one line and replaced that code with the following:
for($i=0;$i<count($_FILES['file']['name']);$i++)
{
$fname= rawurlencode($_FILES['file']['name'][$i]);
move_uploaded_file($_FILES['file']['tmp_name'][$i],
"../filepath/" . $fname);
}
This allows me to save any filename on server using URL encoding (% and two hexadecimals) which is compatible with both cp1252 and UTF-8.
To list the saved files I use filepaths I have saved in DB and list them for files. I was using the following code:
if (is_dir($dir)){
if ($dh = opendir($dir)){
while (($file = readdir($dh)) !== false){
if(is_file($dir . $file)){
echo "<li><a href='".$dir.$file."' download='".$file ."'>".$file."</a></li><br />";
}
}
closedir($dh);
}
}
Since URL encoded filenames were decoded automatically I changed it to:
if (is_dir($dir)){
if ($dh = opendir($dir)){
while (($file = readdir($dh)) !== false){
if(is_file($dir . $file)){
echo "<li><a href='";
print $dir.rawurlencode($file);
echo "' download='" . urldecode($file) ."'>".urldecode($file)."</a></li><br />";
}
}
closedir($dh);
}
}
I don't know if this is the best way to solve it but works perfectly, also I am aware that it is generally a good practice not to use php to generate html tags but at the moment I have some critical bugs that need addressing so first that and then I'll have to work on the appearance of the code itself.
EDIT2
Also the great thing is I do not have to change names of the already uploaded files which in my case is a big advantage.
Are you using $_FILES['upfile']['name'] to name the file? That could create your problem.
How about using GNU Recode?
$fileName = recode_string('latin1',$_FILES['upfile']['name']);
Syntax:
recode_string(string recode type,string $string)
Valid Character sets: http://www.faqs.org/rfcs/rfc1345.html
Somehow you must validate the characters in the uploaded file name.
You could also try sprintf. The formatted string characters can be unpredictable, but will probably work.
$fileName = pathinfo($_FILES['upfile']['name'], PATHINFO_FILENAME);
$fileName = sprintf('./uploads/%s',$fileName);
When you save the file name use
$fileName = mysqli_real_escape_string($fileName)

Categories