Special characters encoding in image filenames after server migration - php

I've migrated a WordPress website from a Hostgator shared host to a Ubuntu Digital Ocean LAMP stack.
The trouble started when I exported the image files which had special characters, for example the file
operários_tarsila-1024x640.jpg.
When WordPress tries to reach the file, it displays an error. I've found the cause:
I can see via Inspect Element that Wordpress tries to call: http://mywebsite.com/wp-content/uploads/2013/02/oper%C3%A1rios_tarsila-1024x640.jpg and the server returns a 404 error.
However if I type this URL in the browser: http://mywebsite.com/wp-content/uploads/2013/02/opera%CC%81rios_tarsila-1024x640.jpg it works and the image is displayed.
So, it seems like this difference between the á encoding from %C3%A1 (á character) to a+%CC%81 (combining accute accent) is what is causing WordPress to not display my images.
So now I have in my server thousands of accented image filenames with the structure character+ combining accent and WordPress calling the image filenames with the structure accented character.
Is there a way bash rename all of them with a comparisson table? Or a way to make Apache aware of those differences and point to the right file when this kind of confusion happen?

Apparently the problem is how the backup is decompressed on the new server.
There are 2 ways to fix this:
Rename the files manually by names without accents and then modify the database and change the file names in the database (This maluco and can be dangerous, it would be best to back up the database).
Upload files using Filezilla, but setting it to force the charset encoding in UTF-8.
File> Site Manager> {YOUR SITE}> Tab Charset> Force UTF-8

We have same problem - Mac + FileZilla + special characters in SK language.
Problem fixed using another FTP client (Cyberduck in our case ).
It seems to be a problem with FileZilla filenames encofing. Force utf8 encoding (FileZilla host settings) doesn't help.

So, just to touch upon this issue and a solution that worked for me... I also migrated a Wordpress site and found that all images with special characters in their filename produced a 404 after migration.
I ended up having to do the manual file renaming and edits to the database via phpMyAdmin. It was arduous and I definitely recommend backing up your database first.
In my case, I had a ton of media attachments that used the special character © in their filename.
First, I locally renamed the files by removing the character. I used 1-4a rename. Just found the filename and replaced it with nothing (not even a space). Then, I removed all the old files from the /wp-content/uploads/ folder and replaced them with the new files.
Next, I went into my database to update the table values. Media attachments have info stored in both the wp_posts and wp_postmeta tables. Below is the SQL I ran to update both -
update wp_posts set guid = replace(guid,'©','');
UPDATE wp_postmeta SET meta_value = REPLACE(meta_value, '©', '')
WHERE LOWER(RIGHT(meta_value, 5)) = '.jpeg' OR
LOWER(RIGHT(meta_value, 4)) IN ('.jpg', '.gif', '.png')
Which, again, we are replacing the character with nothing, not even a space.
I had to use the WP plugin Regenerate Thumbnails in order to have all of thumbnails + various attachment sizes update, but that did the trick.
I really appreciate everyone's efforts on this post and this post to help me figure it out! Hope this helps someone!

Have you tried setting the same encoding in PHP script, Mysql and HTML ?
PHP : http://php.net/manual/en/function.mb-internal-encoding.php
Mysql : http://php.net/manual/en/function.mysql-set-charset.php
HTML : <meta http-equiv="content-type" content="text/html; charset=utf-8" />
This problem is looking like a charset accordance problem between all these languages.
If this is not working, you will have to use a small script to rename all your pictures, using a function like :
function wd_remove_accents($str, $charset='utf-8')
{
$str = htmlentities($str, ENT_NOQUOTES, $charset);
$str = preg_replace('#&([A-za-z])(?:acute|cedil|caron|circ|grave|orn|ring|slash|th|tilde|uml);#', '\1', $str);
$str = preg_replace('#&([A-za-z]{2})(?:lig);#', '\1', $str); // pour les ligatures e.g. 'œ'
$str = preg_replace('#&[^;]+;#', '', $str); // supprime les autres caractères
return $str;
}
Source : http://www.weirdog.com/blog/php/supprimer-les-accents-des-caracteres-accentues.html

We have just had a similar problem with french caracters in our wordpress deployment, and our solution was to upload the files with FileZilla from a PC instead of FileZilla from a Mac.
When I would upload from mac OSX to the CentOS server, the files will only show if called in the a+%CC%81 format.
When I uploaded the files from the PC, apache found the files in the %C3%A1 format, which was how wordpress had them encoded.

If you have WP_CLI run this BashScript. You must change the wp_ table prefix.
It only modifies the file-names that are NOT on FORM_D format.
Backup your DB just in case something goes wrong.
#!/bin/bash
normalizeWP_PHP_Script=$'
global $wpdb;
$rows = $wpdb->get_results( "SELECT * FROM wp_postmeta where meta_key='"'"'_wp_attached_file'"'"'");
foreach ( $rows as $row )
{
$postId = $row->{'"'"'post_id'"'"'};
$filePath = $row->{'"'"'meta_value'"'"'};
if( ! normalizer_is_normalized($filePath, Normalizer::FORM_D) ){
$filename_nfd = Normalizer::normalize($filePath, Normalizer::FORM_D);
echo $filename_nfd." | ";
$wpdb->query($wpdb->prepare("UPDATE wp_postmeta SET meta_value='"'"'$filename_nfd'"'"' WHERE post_id=$postId"));
}
}';
wp eval "$normalizeWP_PHP_Script"
echo " - Uploads-url nomalized --nfd"

There's a plugin for this situation.
You can check on Media File Renamer

Related

Problem with accented characters (ADODB + MySQL + PHP)

I'm experiencing a problem developing a project where I can't put accented characters.
I'm using to make the connection to the MySQL database, Adodb in version 5.20.14.
I have a main configuration file where the connection is set as follows:
$s_driver = "mysqli";
$o_db = adoNewConnection($s_driver);
$o_db->connect($s_dbhost,$s_dbuser,$s_dbpasswd,$s_dbname);
$o_db->SetFetchMode(ADODB_FETCH_ASSOC);
$o_db->setCharset("utf8");
All files developed are UTF-8 encoded, as shown in the image below:
Image 02
NOTE: I use VsCode.
All PHP files that are pages, have the charset meta set to utf-8, as shown in the image below:
<!DOCTYPE html>
<html lang="pt-br">
<head>
<meta charset="utf-8">
My database is configured for utf8, as shown in the image below:
Image 04
The tables are configured with the utf8 charset, as shown in the image below:
Image 5
The columns are configured with the utf8 charset, as shown in the image below:
Image 6
NOTE: I added the "NOT NULL" rule to ignore tables that do not have a charset configuration.
When executing the query to include the information in the database, I run as follows:
$s_query_incluir = "INSERT INTO agtb_ordensdeservicos(id_agenda,
id_empresa,
hora_ini,
hora_fim,
observacoes,
tipo,
csa)
VALUES('".$a_post['add_id_os_dt_agenda']."',
'".ID_EMP_ATUAL."',
'".$a_post["add_os_hora_ini"]."',
'".$a_post['add_os_hora_fim']."',
'".$a_post['add_observacao']."',
'".$a_post['add_os_tipo']."',
'".$a_post['add_os_csa']."');";
$o_db->execute($s_query_incluir);
NOTE: at the top of the file I include the configuration file that contains the information shown in the first image.
After performing this operation, the database shows the information as follows:
Image 8
When viewing the information on the website, it appears as follows:
Image 9
The original text being this:
Image 10
I managed to make it work by adding the "setCharset" before giving the "execute" in the query, as shown in the image below:
$s_query_incluir = "INSERT INTO agtb_ordensdeservicos(id_agenda,
id_empresa,
hora_ini,
hora_fim,
observacoes,
tipo,
csa)
VALUES('".$a_post['add_id_os_dt_agenda']."',
'".ID_EMP_ATUAL."',
'".$a_post["add_os_hora_ini"]."',
'".$a_post['add_os_hora_fim']."',
'".$a_post['add_observacao']."',
'".$a_post['add_os_tipo']."',
'".$a_post['add_os_csa']."');";
$o_db->setCharset("utf8");
$o_db->execute($s_query_incluir);
Achieving the following result in MySQL:
Image 12
I would like to understand where I am going wrong. I'm trying to make utf8 "automatic" without having to call "setCharset" before any kind of query execution.
I appreciate any kind of help. :)
NOTE: if you need more information about the process to better understand the problem, just let me know.
I have identified the reason for the problem.
Even configuring the database, tables and columns for UTF-8 / utf8_general_ci and the PHP file in UTF-8, the problem of strange characters remained.
The problem was in the standard PHP functions that format text, including:
strtolower()
strtoupper()
ucfirst()
ucwords()
When executing these functions in accented texts (for example, "acentuação"), their return was lost.
After removing these functions from my entire project, the strange character problem in the database stopped occurring (including and querying through PHP and through MySQL Workbench).
I hope this can help other people who end up experiencing the same problem as me. :)
NOTE: it took me a long time to answer because I was only able to validate this information after placing these changes in a production environment.

Function with special characters

I am creating a site where the authenticated user can write messages for the index site.
On the message create site I have a textbox where the user can give the title of the message, and a textbox where he can write the message.
The message will be exported to a .txt file and from the title I'm creating the title of the .txt file and like this:
Title: This is a message (The filename will be: thisisamessage.txt)
The original given text as filename will be stored in a database rekord among with the .txt filename as path.
For converting the title text I am using a function that looks like this:
function filenameconverter($title){
$filename=str_replace(" ","",$title);
$filename=str_replace("ű","u",$filename);
$filename=str_replace("á","a",$filename);
$filename=str_replace("ú","u",$filename);
$filename=str_replace("ö","o",$filename);
$filename=str_replace("ő","o",$filename);
$filename=str_replace("ó","o",$filename);
$filename=str_replace("é","e",$filename);
$filename=str_replace("ü","u",$filename);
$filename=str_replace("í","i",$filename);
$filename=str_replace("Ű","U",$filename);
$filename=str_replace("Á","A",$filename);
$filename=str_replace("Ú","U",$filename);
$filename=str_replace("Ö","O",$filename);
$filename=str_replace("Ő","O",$filename);
$filename=str_replace("Ó","O",$filename);
$filename=str_replace("É","E",$filename);
$filename=str_replace("Ü","U",$filename);
$filename=str_replace("Í","I",$filename);
return $filename;
}
However it works fine at the most of the time, but sometimes it is not doing its work.
For example: "Pamutkéztörlő adagoló és higiéniai kéztörlő adagoló".
It should stand as a .txt as:
pamutkeztorloadagoloeshigieniaikeztorloadagolo.txt, and most of the times it is.
But sometimes when im giving this it will be:
pamutkă©ztă¶rlĺ‘adagolăłă©shigiă©niaikă©ztă¶rlĺ‘adagolăł.txt
I'm hungarian so the title text will be also hungarian, thats why i have to change the characters.
I'm using XAMPP with apache and phpmyadmin.
I would rather use a generated unique ID for each file as its filename and save the real name in a separate column.
This way you can avoid that someone overwrites files by simply uploading them several times. But if that is what you want you will find several approaches on cleaning filenames here on SO and one very good that I used is http://cubiq.org/the-perfect-php-clean-url-generator
intl
I don't think it is advisable to use str_replace manually for this purpose. You can use the bundled intl extension available as of PHP 5.3.0. Make sure the extension is turned on in your XAMPP settings.
Then, use the transliterator_transliterate() function to transform the string. You can also convert them to lowercase along. Credit goes to simonsimcity.
<?php
$input = 'Pamutkéztörlő adagoló és higiéniai kéztörlő adagoló';
$output = transliterator_transliterate('Any-Latin; Latin-ASCII; lower()', $input);
print(str_replace(' ', '', $output)); //pamutkeztorloadagoloeshigieniaikeztorloadagolo
?>
P.S. Unfortunately, the php manual on this function doesn't elaborate the available transliterator strings, but you can take a look at Artefacto's answer here.
iconv
Using iconv still returns some of the diacritics that are probably not expected.
print(iconv("UTF-8","ASCII//TRANSLIT",$input)); //Pamutk'ezt"orl"o adagol'o 'es higi'eniai k'ezt"orl"o adagol'o
mb_convert_encoding
While, using encoding conversion from Hungarian ISO to ASCII or UTF-8 also gives similar problems you have mentioned.
print(mb_convert_encoding($input, "ASCII", "ISO-8859-16")); //Pamutk??zt??rl?? adagol?? ??s higi??niai k??zt??rl?? adagol??
print(mb_convert_encoding($input, "UTF-8", "ISO-8859-16")); //PamutkéztörlŠadagoló és higiéniai kéztörlŠadagoló
P.S. Similar question could also be found here and here.

PHP's base64_decode returns wrong result working in IIS

I'm trying to show a jpg that was previously encoded in a WCF web service using:
<?php
require_once '../inc/config.php';
[...]
header("Content-type: image/jpg");
echo base64_decode($doc['BDATA']);
But I'm getting a
Can't display the image because it contains errors.
I've decoded the base64 string in this web app www.opinionatedgeek.com/dotnet/tools/base64decode/ and the result is right, but different that the one I'm getting with base64_decode, which is wrong.
Edit: I have two enviroments using the same code: Test and Production. It works fine in Test, but not in Production, so I'm thinking in some configuration problem.
I'm working with PHP 5.5.9 in Microsoft IIS.
An example of a string that base64_decode isn't decoding well:
/9j/4AAQSkZJRgABAQEAeAB4AAD/2wBDAAIBAQIBAQICAgICAgICAwUDAwMDAwYEBAMFBwYHBwcGBwcICQsJCAgKCAcHCg0KCgsMDAwMBwkODw0MDgsMDAz/2wBDAQICAgMDAwYDAwYMCAcIDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAz/wAARCAABAAEDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD9/KKKKAP/2Q==
Any ideas?
Edit 2: If I comment this line
require_once '../inc/config.php';
and copy the code from config.php to my actual file, it works fine. What could be happening?
From base_64_decode manual comments
php <= 5.0.5's base64_decode( $string ) will assume that a space is
meant to be a + sign where php >= 5.1.0's base64_decode( $string )
will no longer make that assumption
To fix this behavior try this code
$encodedData = str_replace(' ','+',$encodedData);
$decocedData = base64_decode($encodedData);
As this is no't your case then you have to check this answer
Because every thing work fine for me here on (WAMP)
EDIT:
As in our below conversation
There are a lot of things that may corrupt header for example , if
your file encoding is UTF-8 then you should save it as UTF-8 Without
bom you can do this using notepad ++ , also make sure if you use FTP
that your client didn't any chars to your file , rather than that
every thing should work fine
base64 encoding is not completely standardised.
Some implementations use different characters, so you'll have to replace those characters before you run your decode.
further details

PHP: string breaks at special character

I wrote a small PHP script which does a "branding" on a present PDF file. This means on every page I put a string like "belongs to " at a special position. Therefor I use Zend_Pdf out of the Zend Framework.
Because the script is used in German language area, in one line there I use the special character "ö" ("Gehört zu ").
On my local machine (Windows, XAMPP) the script worked fine, but when moving it to my hoster's webspace (some Linux), the string breaks at "ö". That means in my PDF on appears "Geh".
The code is this:
if (substr($file, strlen($file) - 4) === '.pdf') {
$name = $user->GetName;
$fontSize = 12;
$xTextPos = 100;
$yTextPos = 10;
set_include_path(dirname(__FILE__)); // set include_path for external library Zend Framework
require_once('Zend' .DS . 'Pdf.php');
$pdf = Zend_Pdf::load($file);
$font = Zend_Pdf_Font::fontWithName(Zend_Pdf_Font::FONT_HELVETICA);
$branding = 'Gehört zu ' . $name; // German for: 'Belongs to ', problem with 'ö'
foreach ($pdf->pages as &$page) {
$page->setFont($font, $fontSize);
$page->drawText($branding, $xTextPos, $yTextPos);
}
}
I guess the problem is related to some kind of default charset or language setting of the PHP environment. So I searched here and tried out:
$branding = utf8_encode('Gehört zu ') . $name;
...and I made some experiments with functions like html_entity_decode but nothing helped and I decided stopping groping in the dark and open an own question.
Looking forward to any hints. Thank you in advance for your help!
EDIT: Meanwhile I found the same (?) problem, solved on a German forum. But if I do it like they say...
$branding = mb_convert_encoding('Gehört zu ', 'ISO-8859-1') . $name;
... the resulting branding in the PDF is "Gehrt zu ". The "ö" is skipped now.
For this I found another hint on the Zend issue tracker.
I sum up, that I can drop all UTF8 things and concentrate on Latin-1 AKA ISO 8859-1.
I still don't understand why the code worked on my Windows + XAMPP and now crashes on my hoster's Linux.
Your guess is right, the problem is related to encoding. Where exactly the encoding is messed up is hard to say from afar. I'm assuming you work not only with Zend_Pdf, but also have the MVC in place (meaning a complete Zend_Application).
You should check if your application serves pages as UTF-8, by setting:
resources.view.encoding = "UTF-8"
and also placing the appropriate meta-tags in your layout/view.
Depending on what Editor you use, your files may be encoded in a different encoding. You can use Notepad++ on Windows to check your file-encoding and for converting it to UTF-8 (don't just set the encoding to UTF-8, this might mess up your file!) if necessary. I recommend using Eclipse with text file encoding set to "UTF-8" (Preferences > General > Workspace) to make sure your code files are encoded in UTF-8.
Now comes the crucial part:
Zend_Pdf_Page::drawText(string $text, float $x, float $y, string $charEncoding)
See that last argument... set it. If you're lucky, you can skip the previous stuff and just set the encoding there.
edit: I missed something. Database connections. You should check the encoding there too. I frequently work with MS SQL Server, which uses Latin-1 internally; not setting driver_otpions.CharacterSet can mess up stuff pretty bad too. This might be relevant, if you have soemthing like Gehört zu: Günther, where the Name Günther is fetched from db.
Encoding is also depending of the file encoding.
If you encode your file in UTF8 for example and use ut8_encode("ö"), then you'll encode in UTF_8 something already in UTF_8.
So you may want to check what your file encoding is, and what your PDF lib is requiring. Then apply the right formula/transformation.

Accents in uploaded file being replaced with '?'

I am building a data import tool for the admin section of a website I am working on. The data is in both French and English, and contains many accented characters. Whenever I attempt to upload a file, parse the data, and store it in my MySQL database, the accents are replaced with '?'.
I have text files containing data (charset is iso-8859-1) which I upload to my server using CodeIgniter's file upload library. I then read the file in PHP.
My code is similar to this:
$this->upload->do_upload()
$data = array('upload_data' => $this->upload->data());
$fileHandle = fopen($data['upload_data']['full_path'], "r");
while (($line = fgets($fileHandle)) !== false) {
echo $line;
}
This produces lines with accents replaced with '?'. Everything else is correct.
If I download my uploaded file from my server over FTP, the charset is still iso-8850-1, but a diff reveals that the file has changed. However, if I open the file in TextEdit, it displays properly.
I attempted to use PHP's stream_encoding method to explicitly set my file stream to iso-8859-1, but my build of PHP does not have the method.
After running out of ideas, I tried wrapping my strings in both utf8_encode and utf8_decode. Neither worked.
If anyone has any suggestions about things I could try, I would be extremely grateful.
It's Important to see if the corruption is happening before or after the query is being issued to mySQL. There are too many possible things happening here to be able to pinpoint it. Are you able to output your MySql to check this?
Assuming that your query IS properly formed (no corruption at the stage the query is being outputted) there are a couple of things that you should check.
What is the character encoding of the database itself? (collation)
What is the Charset of the connection - this may not be set up correctly in your mysql config and can be manually set using the 'SET NAMES' command
In my own application I issue a 'SET NAMES utf8' as my first query after establishing a connection as I am unable to change the MySQL config.
See this.
http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html
Edit: If the issue is not related to mysql I'd check the following
You say the encoding of the file is 'charset is iso-8859-1' - can I ask how you are sure of this?
What happens if you save the file itself as utf8 (Without BOM) and try to reprocess it?
What is the encoding of the php file that is performing the conversion? (What are you using to write your php - it may be 'managing' this for you in an undesired way)
(an aside) Are the files you are processing suitable for processing using fgetcsv instead?
http://php.net/manual/en/function.fgetcsv.php
Files uploaded to your server should be returned the same on download. That means, the encoding of the file (which is just a bunch of binary data) should not be changed. Instead you should take care that you are able to store the binary information of that file unchanged.
To achieve that with your database, create a BLOB field. That's the right column type for it. It's just binary data.
Assuming you're using MySQL, this is the reference: The BLOB and TEXT Types, look out for BLOB.
The problem is that you are using iso-8859-1 instead of utf-8. In order to encode it in the correct charset, you should use the iconv function, like so:
$output_string = iconv('utf-8", "utf-8//TRANSLIT", $input_string);
iso-8859-1 does not have the encoding for any sort of accents.
It would be so much better if everything were utf-8, as it handles virtually every character known to man.

Categories