Umlauts are displayed as ? in my MySQL database? - php

this is my local scenario:
I have an application which reads some CSV files and writers the content to my local MYSQL database. The content contains umlauts, such as "ß" or "Ä". Locally everything works fine, the umlauts are written to the db and also displayed correclty inside the app which reads the db.
Now I moved this scenario to the amazon cloud and suddenly "ß" becomes "?" in the db. I checked what the program reads from the CSV files and there it is still a "ß". So it must be the writing to the database I guess, question is, why was this working locally but not on my cloud server? Is this a db problem, or a PHP problem?
Thanks! :)

Did you check the encoding on both databases? Most likely there might be the problem.

You need to have your database UTF-8 encoded.
Here is an excellent overview article that explains how encoding works in MySQL, and multiple ways to fix it:
http://www.bluebox.net/news/2009/07/mysql_encoding/

you can use:
iconv("UTF-8", "ISO-8859-1", $stringhere);
this will convert the string for you

it all depends on how you upload the CSV file.
Before writing the data to the MySQL server, this code might help:
$Notes = $_POST['Notes']; // or contents of the CSV file- but split the data first according to newline etc.
$charset = mysqli_character_set_name($DBConnect);
printf ("To check your character set but not necessary %s\n",$charset);
$Notes = str_replace('"', '"', $Notes); //double quotes for mailto: emails.
$von = array("ä","ö","ü","ß","Ä","Ö","Ü"," ","é"); //to correct double whitepaces as well
$zu = array("ä","ö","ü","ß","Ä","Ö","Ü"," ","é");
$Notes = str_replace($von, $zu, $Notes);
echo " Notes:".$Notes."<br>" ;
$Notes = mysqli_real_escape_string($link, $Notes); //for mysqli DB connection.
echo " Notes:".$Notes ;

Related

Stuck writing UTF-8 file via PHP's fwrite

I can't figure out what I'm doing wrong. I'm getting file content from the database. When I echo the content, everything displays just fine, when I write it to a file (.html) it breaks. I've tried iconv and a few other solutions, but I just don't understand what I should put for the first parameter, I've tried blanks, and that didn't work very well either. I assume it's coming out of the DB as UTF-8 if it's echoing properly. Been stuck a little while now without much luck.
function file($fileName, $content) {
if (!file_exists("out/".$fileName)) {
$file_handle = fopen(DOCROOT . "out/".$fileName, "wb") or die("can't open file");
fwrite($file_handle, iconv('UTF-8', 'UTF-8', $content));
fclose($file_handle);
return TRUE;
} else {
return FALSE;
}
}
Source of the html file looks like.
Comes out of the DB like this:
<h5>Текущая стабильная версия CMS</h5>
goes in file like this
<h5>Ð¢ÐµÐºÑƒÑ‰Ð°Ñ ÑÑ‚Ð°Ð±Ð¸Ð»ÑŒÐ½Ð°Ñ Ð²ÐµÑ€ÑÐ¸Ñ CMS</h5>
EDIT:
Turns out the root of the problem was Apache serving the files incorrectly. Adding
AddDefaultCharset utf-8
To my .htaccess file fixed it. Hours wasted... At least I learned something though.
Edit: The database encoding does not seem to be the issue here, so this part of the answer is retained for information only
I assume it's coming out of the DB as UTF-8
This is most likely your problem, what database type do you use? Have you set the character encoding and collation details for the database, the table, the connection and the transfer.
If I was to hazard a guess, I would say your table is MySQL and that your MySQL collation for the database / table / column should all be UTF8_general_ci ?
However, for some reason MySQL UTF8 is not actually UTF8, as it stores its data in 3bits rather than 4bits, so can not store the whole UTF-8 Character sets, see UTF-8 all the way through .
So you need to go through every table, column on your MySQL and change it from UTF8_ to the UTF8mb4_ (note: since MySQL 5.5.3) which is UTF8_multibyte_4 which covers the whole UTF-8 Spectrum of characters.
Also if you do any PHP work on the data strings be aware you should be using mb_ PHP functions for multibyte encodings.
And finally, you need to specify a connection character set for the database, don't run with the default one as it will almost certainly not be UTF8mb4, and hence you can have the correct data in the database, but then that data is repackaged as 3bit UTF8 before then being treated as 4bit UTF8 by PHP at the other end.
Hope this helps, and if your DB is not MySQL, let us know what it is!
Edit:
function file($fileName, $content) {
if (!file_exists("out/".$fileName)) {
$file_handle = fopen(DOCROOT . "out/".$fileName, "wb") or die("can't open file");
fwrite($file_handle, iconv('UTF-8', 'UTF-8', $content));
fclose($file_handle);
return TRUE;
} else {
return FALSE;
}
}
your $file_handle is trying to open a file inside an if statement that will only run if the file does not exist.
Your iconv is worthless here, turning from "utf-8" to er, "utf-8". character detection is extremely haphazard and hard for programs to do correctly so it's generally advised not to try and work out / guess what a character encoding it, you need to know what it is and tell the function what it is.
The comment by Dean is actually very important. The HTML should have a <meta charset="UTF-8"> inside <head>.
That iconv call is actually not useful and, if you are right that you are getting your content as UTF-8, it is not necessary.
You should check the character set of your database connection. Your database can be encoded in UTF-8 but the connection could be in another character set.
Good luck!

trying to export csv in microsoft excel with special character like chinese but failed

I have a web app where I am trying to export to CSV from a database.
It runs perfectly with english character set, but when I put some chinese text in the database my CSV shows dumb character like ????.
<?php
$con=mysqli_connect(global_dbhost,global_dbusername,global_dbpassword,global_dbdatabase);
if(isset($_GET['csv']))
{
$query ='SELECT CONCAT("TC00", `t_id`),m_id,s_id,t_name,Description,start_date,end_date,start_time,end_time,status,active FROM tc_task';
$today = date("dmY");
//CSVExport($query);
$con=mysqli_connect(global_dbhost,global_dbusername,global_dbpassword,global_dbdatabase);
//echo 'inside function';
$sql_csv = mysqli_query($con,$query) or die("Error: " . mysqli_error()); //Replace this line with what is appropriate for your DB abstraction layer
file_put_contents("csvLOG.txt","\n inside ajax",FILE_APPEND);
header("Content-type:text/octect-stream");
header("Content-Disposition:attachment;filename=caring_data".$today.".csv");
while($row = mysqli_fetch_row($sql_csv)) {
print '"' . stripslashes(implode('","',$row)) . "\"\n";
}
exit;
}
?>
Solution available here:
Open in notepad (or equivalent)
Re-Save CSV as Unicode (not UTF-8)
Open in Excel
Profit
Excel does not handle UTF-8. If you go to the import options for CSV files, you will notice there is no choice for UTF-8 encoding. Since Unicode is supported, the above should work (though it is an extra step). The equivalent can likely be done on the PHP side (if you can save as Unicode instead of UTF-8). This seems doable according to this page which suggests:
$unicode_str_for_Excel = chr(255).chr(254).mb_convert_encoding( $utf8_str, 'UTF-16LE', 'UTF-8');

SQLite UTF-8 output with fwrite

I was trying to output UTF-8 text read from SQLite database to a text file using fwrite function, but with no luck at all.
When I echo the content to the browser I can read it with no problem. As a last resort, I created the same tables into MySQL database, and surprisingly it worked!
What could be the cause, how can I debug this so that I can use SQLite DB?
I am using PDO.
Below is the code I am using to read from DB and write to file:
$arFile = realpath(APP_PATH.'output/Arabic.txt');
$arfh = fopen($arFile, 'w');
$arTxt = '';
$key = 'somekey';
$sql = 'SELECT ot.langv AS orgv, et.langv AS engv, at.langv AS arbv FROM original ot LEFT JOIN en_vals et ON ot.langk=et.langk
LEFT JOIN ar_vals at ON ot.langk=at.langk
WHERE ot.langk=:key';
$stm = $dbh->prepare($sql);
$stm->execute(array(':key'=>$key));
if( $row = $stm->fetch(PDO::FETCH_ASSOC) ){
$arTxt .= '$_LANG["'.$key.'"] = "'.special_escape($row['arbv']).'";'."\n";
}
fwrite( $arfh, $arTxt);
fclose($arfh);
What could be the cause, how can I debug this so that I can use SQLite DB?
SQLite stores text into the database as it receives it. So if you store UTF-8 encoded text into the SQLite database, you can read UTF-8 text from it.
If you store, let's say, LATIN-1 text into the database, then you can read LATIN-1 text from it.
SQLite itself does not care. So you get out what you put in.
As you write in your question that display in browser looks good I would assume that at least some valid encoded values have been stored inside the database. You might want to look into your browser when you view that data, what your browser tells you in which encoding that data is.
If it says UTF-8 then fine. You might just only view the text-file with an editor that does not support UTF-8 to view files. Because fwrite also does not care about the encoding, it just puts the string data into the file and that's it.
So as long as you don't provide additional information with your question it's hard to tell something more specific.
See as well: How to change character encoding of a PDO/SQLite connection in PHP?

Accents in uploaded file being replaced with '?'

I am building a data import tool for the admin section of a website I am working on. The data is in both French and English, and contains many accented characters. Whenever I attempt to upload a file, parse the data, and store it in my MySQL database, the accents are replaced with '?'.
I have text files containing data (charset is iso-8859-1) which I upload to my server using CodeIgniter's file upload library. I then read the file in PHP.
My code is similar to this:
$this->upload->do_upload()
$data = array('upload_data' => $this->upload->data());
$fileHandle = fopen($data['upload_data']['full_path'], "r");
while (($line = fgets($fileHandle)) !== false) {
echo $line;
}
This produces lines with accents replaced with '?'. Everything else is correct.
If I download my uploaded file from my server over FTP, the charset is still iso-8850-1, but a diff reveals that the file has changed. However, if I open the file in TextEdit, it displays properly.
I attempted to use PHP's stream_encoding method to explicitly set my file stream to iso-8859-1, but my build of PHP does not have the method.
After running out of ideas, I tried wrapping my strings in both utf8_encode and utf8_decode. Neither worked.
If anyone has any suggestions about things I could try, I would be extremely grateful.
It's Important to see if the corruption is happening before or after the query is being issued to mySQL. There are too many possible things happening here to be able to pinpoint it. Are you able to output your MySql to check this?
Assuming that your query IS properly formed (no corruption at the stage the query is being outputted) there are a couple of things that you should check.
What is the character encoding of the database itself? (collation)
What is the Charset of the connection - this may not be set up correctly in your mysql config and can be manually set using the 'SET NAMES' command
In my own application I issue a 'SET NAMES utf8' as my first query after establishing a connection as I am unable to change the MySQL config.
See this.
http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html
Edit: If the issue is not related to mysql I'd check the following
You say the encoding of the file is 'charset is iso-8859-1' - can I ask how you are sure of this?
What happens if you save the file itself as utf8 (Without BOM) and try to reprocess it?
What is the encoding of the php file that is performing the conversion? (What are you using to write your php - it may be 'managing' this for you in an undesired way)
(an aside) Are the files you are processing suitable for processing using fgetcsv instead?
http://php.net/manual/en/function.fgetcsv.php
Files uploaded to your server should be returned the same on download. That means, the encoding of the file (which is just a bunch of binary data) should not be changed. Instead you should take care that you are able to store the binary information of that file unchanged.
To achieve that with your database, create a BLOB field. That's the right column type for it. It's just binary data.
Assuming you're using MySQL, this is the reference: The BLOB and TEXT Types, look out for BLOB.
The problem is that you are using iso-8859-1 instead of utf-8. In order to encode it in the correct charset, you should use the iconv function, like so:
$output_string = iconv('utf-8", "utf-8//TRANSLIT", $input_string);
iso-8859-1 does not have the encoding for any sort of accents.
It would be so much better if everything were utf-8, as it handles virtually every character known to man.

utf8_encode or decode isn't doing what I expect

I am taking an XML file and reading it into various strings, before writing to a database, however I am having difficulty with German characters.
The XML file starts off
<?xml version="1.0" encoding="UTF-8"?>
Then an example of where I am having problems is this part
<name><![CDATA[PONS Großwörterbuch Deutsch als Fremdsprache Android]]></name>
My PHP has this relevant section
$dom = new DOMDocument();
$domNode = $xmlReader->expand();
$element = $dom->appendChild($domNode);
$domString = utf8_encode($dom->saveXML($element));
$product = new SimpleXMLElement($domString);
//read in data
$arr = $product->attributes();
$link_ident = $arr["id"];
$link_id = $platform . "" . $link_ident;
$link_name = $product->name;
So $link_name becomes PONS GroÃwörterbuch Deutsch als Fremdsprache Android
I then did a
$link_name = utf8_decode($link_name);
Which when I echoed back in terminal worked fine
PONS GroÃwörterbuch Deutsch als Fremdsprache Android as is now
PONS Großwörterbuch Deutsch als Fremdsprache Android after utf8decode
However when it is written into my database it appears as:
PONS Kompaktwörterbuch Deutsch-Englisch (Android)
The collation for link_name in MysQL is utf8_general_ci
How should I be doing this to get it correctly written into my database?
This is the code I use to write to the database
$link_name = utf8_decode($link_name);
$link_id = mysql_real_escape_string($link_id);
$link_name = mysql_real_escape_string($link_name);
$description = mysql_real_escape_string($description);
$metadesc = mysql_real_escape_string($metadesc);
$link_created = mysql_real_escape_string($link_created);
$link_modified = mysql_real_escape_string($link_modified);
$website = mysql_real_escape_string($website);
$cost = mysql_real_escape_string($cost);
$image_name = mysql_real_escape_string($image_name);
$query = "REPLACE into jos_mt_links
(link_id, link_name, alias, link_desc, user_id, link_published,link_approved, metadesc, link_created, link_modified, website, price)
VALUES ('$link_id','$link_name','$link_name','$description','63','1','1','$metadesc','$link_created','$link_modified','$website','$cost')";
echo $link_name . " has been inserted ";
and when I run it from shell I see
PONS Kompaktwörterbuch Deutsch-Englisch (Android) has been inserted
You've got a UTF-8 string from an XML file, and you're putting it into a UTF-8 database. So there is no encoding or decode to be done, just shove the original string into the database. Make sure you've used mysql_set_charset('utf-8') first to tell the database there are UTF-8 strings coming.
utf8_decode and utf8_encode are misleadingly named. They are only for converting between UTF-8 and ISO-8859-1 encodings. Calling utf8_decode, which converts to ISO-8859-1, will naturally lose any characters you have that don't fit in that encoding. You should generally avoid these functions unless there's a specific place where you need to be using 8859-1.
You should not consider what the terminal shows when you echo a string to be definitive. The terminal has its own encoding problems and especially under Windows it is likely to be impossible to output every character properly. On a Western Windows install the system code page (which the terminal will use to turn the bytes PHP spits out into characters to display on-screen) will be code page 1252, which is similar to but not the same as ISO-8859-1. This is why utf8_decode, which spits out ISO-8859-1, appeared to make the text appear as you expected. But that's of little use. Internally you should be using UTF-8 for all strings.
You must use mb_convert_encoding or iconv unction before you write into your database.

Categories