I have one problem. I have excel file saved as CSV and I need to read that file with PHP and insert into mysql but problem is with char set specifically čćšđž. I tried utf8_encode() and almost everything I could think of.
Examle:
It inserts "Petroviæ" but it should be "Petrović"
EDIT:
<?php
mysql_connect("localhost", "user", "pw");
mysql_select_db("database");
$fajl = "Prodajna mreza.csv";
$handle = #fopen($fajl, "r");
if ($handle) {
$size = filesize($fajl);
if(!$size) {
echo "File is empty.\n";
exit;
}
$csvcontent = fread($handle,$size);
$red = 1;
foreach(explode("\n",$csvcontent) as $line) {
if(strlen($line) <= 20)
{
$red++;
continue;
}
if($red == 1)
{
$red++;
continue;
}
$nesto = explode(",", $line);
if($nesto[0] == '')
continue;
mysql_query("INSERT INTO table(val1, val2, val3, val4, val5, val6, val7, val8) VALUES ('".$nesto[0]."','".$nesto[1]."','".$nesto[2]."','".$nesto[3]."','".$nesto[4]."','".$nesto[5]."','".$nesto[6]."','".$nesto[7]."')");
$red++;
}
fclose($handle);
}
mysql_close();
?>
First off: Using this mysql extension is discouraged. So you might want to switch to something else. Also notice that the way you compose your query by simply pasting strings makes it vulnerable to SQL injection attacks. You should only do this if you are really really sure that there won't be any ugly surprises in the content of the files you read.
It appears that neither your file reading nor the client-side mysql code does anything related to charset conversion, so I'd assume that those simply pass on bytes, without caring about their interpretation. So you only have to make sure that the server interprets those bytes correctly.
Judging from the example you gave, where a ć got turned into an æ, I'd say your file is in ISO-8859-2 but the database is reading it differently, most probably as ISO-8859-1. You should ensure that your database actually can accept all ISO-8859-2 characters for its columns. Read the MySQL manual on character set support and set some suitable default characterset (probably best on the database level) either to utf8 (preferred) or latin2. You might have to recreate your tables for this change to apply.
Next, you should set the character set of the connection to match that of the file. So utf8 is definitely wrong here, and latin2 the way to go.
Using your current API, mysql_set_charset("latin2") can be used to accomplish that.
That page also describes equivalent approaches for use with other frontends.
As an alternative, you can use a query to set this: mysql_query("SET NAMES 'latin2';");
After all this is done, you should also ensure that things are set up correctly for any script which reads from the database. In other words, the charset of the generated HTML must match the character_set_results of the MySQL session. Otherwise it might well be that things are stored correctly in your database but still appear broken when displayed to the user. If you have the choice, I'd say use utf8 in that case, as doing so makes it easier to include different data whenever the need arises.
If some problems remain, you should pinpoint whether they are while reading from file into php, while exchanging data with mysql, or while presenting the result in HTML. The string "Petrovi\xc4\x87" is the utf8 representation of your example, and "Petrovi\xe6" is the latin2 form. You can use these strings to explicitely pass on data with a known encoding, or to check an incoming transferred value against one of these strings.
it shouldn't be a problem importing a csv from a csv to a database if both the file and the database collation are utf-8.
<?php
db = #mysql_connect('localhost', 'user', 'pass');
#mysql_select_db('my_database');
$CSVFile = "file.csv";
mysql_query('LOAD DATA LOCAL INFILE "' . $CSVFile . '" INTO TABLE my_table
FIELDS TERMINATED BY "," LINES TERMINATED BY "\\r\\n";');
mysql_close($db);
?>
you can add your own. CSV in phpmyadmin...
Import -> format = csv and click on "import"
Or if you don't want use phpmyadmin !
BULK INSERT csv_dump
FROM 'c:\file.csv'
WITH
(
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\n'
)
Related
I have encountered a scenario where an email from someone in Europe keeps failing to execute. After minimizing the query I've determined that after all special characters like å and é are removed the query works fine in PHP / mysqli_query. The queries also don't work in MariaDB's command line though they do work in HeidiSQL, I imagine whatever HeidiSQL uses it internally adjusts strings used in the Query tabs.
Let's get the following out of the way:
Database Character Set: utf8mb4.
Database Collation: utf8mb4_unicode_520_ci.
Database column collation: utf8mb4_unicode_520_ci.
The correct query for the request method SET CHARACTER SET 'utf8mb4' is being correctly executed.
Here is the query:
INSERT INTO example_table (example_column) VALUES ('Håko');
I should note that I tried the following (which also failed) even though I firmly believe that this issue occurs from and should be resolved via PHP:
INSERT INTO example_table (example_column) VALUES (CONVERT('Håko' USING utf8));
Here is the MariaDB error:
Incorrect string value: '\xE9rard ...'
Like I said this string is originating from an email message so I'm pretty sure that the issue is with PHP, not MariaDB. So let's go backwards to that code that seems to otherwise work. Please keep in mind that this has taken at least two days to put together in the correct order to even get the strings to appear correctly in the MariaDB query log without being incorrectly converted to UTF-8 and corrupting the special Latin characters:
<?php
$s1 = '=?iso-8859-1?Q?=22G=E9rd_Tabt=22?= <berbs#example.com>';//"Gérd Tabt" <berbs#example.com>
if (strlen($s1) > 0)
{
if (substr_count($s1, '=?') && substr_count($s1, '?= '))
{
$p = explode('?= ', $s1);
$p[0] = $p[0].'?=';
$s2 = imap_mime_header_decode($p[0])[0]->text.' '.$p[1];
}
else {$s2 = imap_mime_header_decode($s1)[0]->text;}
if (strpos($s1, '=96') !== false) {$s2 = mb_convert_encoding($s2, 'UTF-8', 'CP1252');}
else if (mb_convert_encoding($s2, 'UTF-8') == substr_count($s1, '?')) {$s2 = mb_convert_encoding($s2, 'UTF-8');}
}
else {$s2 = $s1;}
?>
There isn't any other relevant code handling this header string.
What is causing what I presume to be UTF-8 encoded strings to break PHP's mysqli_query and the MariaDB command line from working with this query?
Where did the hex E9 come from? That is encoded latin1. Yet your configuration seems to claim that your client is encoded utf8mb4. You must have the connection charset match what the encoding is in the client. The database and table and client can have a different encoding; MariaDB is happy to convert on the fly when INSERTing or SELECTing.
For more analysis, see Trouble with UTF-8 characters; what I see is not what I stored
if (mb_convert_encoding($s2, 'UTF-8') == substr_count($s1, '?'))
This makes no sense: comparing a string (converted from anything to UTF-8) against an integer (amount of matches) will only ever be equal when the converted text is '0', which is also the amount of finding '?' in it, and due to the type unsafe comparison parameter == this is the only scenario where '0' equals 0.
So your text is never converted to UTF-8 and remains whatever it was (in this case ISO-8859-1).
mb_convert_encoding($s2, 'UTF-8')
Sure you want to convert to UTF-8 without telling the source encoding? ISO-8859-1 as per email header isn't the only one to expect - why not extracting that information and passing it to the function?
MariaDB is right: you're handing over ISO-8859-1 encoded text in that case, while the DBMS expects the UTF-8 encoding.
My Project builds on PHP and connect to MS SQL Server. I am using sqlsrv library. The fields type in MS SQL is nvarchar. When I define the parameters for connection I also put "utf8". It is:
global $cnf;
$cnf = array();
$cnf["mssql_user"] = "xxx";
$cnf["mssql_host"] = "xxx";
$cnf["mssql_pw"] = "xxx";
$cnf["mssql_db"] = "xxx";
$cnf["CharacterSet"] = "**UTF-8**";
When Insert records to database, for Vietnamese content and Chinese content I use:
$city = iconv('UTF-8', 'utf-16le', $post['city']);
$params = array(array($city, null, SQLSRV_PHPTYPE_STRING(SQLSRV_ENC_BINARY)));
$sql= "INSERT INTO tblCityGarden (city) VALUES(?)
$stmt = sqlsrv_query( $this->dbhandle, $sql, $params);
It inserts data OK for Vietnamese and Chinese language (the data stored in database for Vietnamese and Chinese is correct).
However when I load the records back into web, It appears the strange character (?, �).
I try some php as iconv, mb_detect_encoding, mb_convert_encoding and search many results on internet, but It cannot work. How can I display correct data
Please someone who has experiences about this issues
I had this same problem (�), but with single quotes, double qoutes, and "Rights Reserved" characters... Here is what I've found:
The CharacterSet specified seems to "rule all", so what you set this to will determine the encoding for the connection (as it should). I did not have ANY CharacterSet configured on my connection(s). Simply setting this resolved my issue, along with making sure that the values that were inserted into my DB were not double encoded via htmlspecialchars().
header()'s must be set before ANY output (this is really important)
headers cannot be set to something different later in the document
Sometimes trailing spaces before and/or after the closing ?> in your PHP file can cause issues (I don't use this closing tag, but I saw this mentioned a lot while searching)
I am not familiar with iconv(), and I am most certainly not experienced with encoding in general, but I solved my issue just by taking the time to check my headers and ensure they meet the above standards...
Your query parameters also look strange:
$params = array(array($city, null, SQLSRV_PHPTYPE_STRING(SQLSRV_ENC_BINARY)));
I have not seen a multidimensional array passed into that argument... (just a note)
Only today I realized that I was missing this in my PHP scripts:
mysql_set_charset('utf8');
All my tables are InnoDB, collation "utf8_unicode_ci", and all my VARCHAR columns are "utf8_unicode_ci" as well. I have mb_internal_encoding('UTF-8'); on my PHP scripts, and all my PHP files are encoded as UTF-8.
So, until now, every time I "INSERT" something with diacritics, example:
mysql_query('INSERT INTO `table` SET `name`="Jáuò Iñe"');
The 'name' contents would be, in this case: Jáuò Iñe.
Since I fixed the charset between PHP and MySQL, new INSERTs are now storing correctly. However, I want to fix all the older rows that are "messed" at the moment. I tried many things already, but it always breaks the strings on the first "illegal" character. Here is my current code:
$m = mysql_real_escape_string('¿<?php echo "¬<b>\'PHP á (á)ţăriîş </b>"; ?> ă-ţi abcdd;//;ñç´พดแทฝใจคçăâξβψδπλξξςαยนñ ;');
mysql_set_charset('utf8');
mysql_query('INSERT INTO `table` SET `name`="'.$m.'"');
mysql_set_charset('latin1');
mysql_query('INSERT INTO `table` SET `name`="'.$m.'"');
mysql_set_charset('utf8');
$result = mysql_iquery('SELECT * FROM `table`');
while ($row = mysql_fetch_assoc($result)) {
$message = $row['name'];
$message = mb_convert_encoding($message, 'ISO-8859-15', 'UTF-8');
//$message = iconv("UTF-8", "ISO-8859-1//IGNORE", $message);
mysql_iquery('UPDATE `table` SET `name`="'.mysql_real_escape_string($message).'" WHERE `a1`="'.$row['a1'].'"');
}
It "UPDATE"s with the expected characters, except that the string gets truncated after the character "ă". I mean, that character and following chars are not included on the string.
Also, testing with the "iconv()" (that is commented on the code) does the same, even with //IGNORE and //TRANSLIT
I also tested several charsets, between ISO-8859-1 and ISO-8859-15.
From what you describe, it seems you have UTF-8 data that was originally stored as Latin-1 and then not converted correctly to UTF-8. The data is recoverable; you'll need a MySQL function like
convert(cast(convert(name using latin1) as binary) using utf8)
It's possible that you may need to omit the inner conversion, depending on how the data was altered during the encoding conversion.
After I searched about an hour or two for this answer, I needed to migrate an old tt_news db from typo into a new typo3 version. I tried to convert the charset in the export file and import it back already, but didn't get it working.
Then I tried the answer above from ABS and started an update on the table:
UPDATE tt_news SET
title=convert(cast(convert(title using latin1) as binary) using utf8),
short=convert(cast(convert(short using latin1) as binary) using utf8),
bodytext=convert(cast(convert(bodytext using latin1) as binary) using utf8)
WHERE 1
You can also convert imagecaption, imagealttext, imagetitletext and keywords if needed.
Hope this will help somebody migrating tt_news to new typo3 version.
the way is better way
use connection tow you database normal
then use this code to make what you need
you must make your page encoding utf-8 by meta in header cod html (dont forget this)
then use this code
$result = mysql_query('SELECT * FROM shops');
while ($row = mysql_fetch_assoc($
$name= iconv("windows-1256", "UTF-8", $row['name']);
mysql_query("SET NAMES 'utf8'");
mysql_query("update `shops` SET `name`='".$name."' where ID='$row[ID]' ");
}
I highly recommend using 'utf8mb4' instead of 'utf8', since utf8 cannot store some chinese characters and emojis.
On a webserver, I have a php script that parses a .sql file (which is stored directly on the server), and executes the queries on a mysql database. I have a lot of french characters that doesn't insert well: é becomes é.
When I open the sql file with notepad++, I see that the encoding is "uft-8 without BOM".
My script looks like this:
$handle = fopen("test.sql", "r") or die("couldn't get handle");
if ($handle)
{
while (!feof($handle))
{
$buffer = fgets($handle, 4096);
if (strlen ( $buffer ) < 3 ) // if we have a blank line
{
mysql_query($query);
$query = $buffer;
sleep(0.5);
}
else
{
$query .= $buffer;
}
}
mysql_query($query); // last insert
fclose($handle);
}
When I open the database through phpmyadmin, I see that the special chars are already broken right after the execution of the script.
You may need to run 'SET NAMES UTF8' before you do the insert, because mysql is so hilariously flaky about character encoding. Yes, even if your entire database has already been set to use the UTF-8 character encoding and general-utf8-ci collation.
http://forums.mysql.com/read.php?103,46870,46870#msg-46870
Instead, you should use the mysql_set_charset function and not a SET NAMES query, as described at http://www.php.net/manual/en/function.mysql-set-charset.php
Even though your database is in UTF-8, and PHP deals in UTF-8, the connection set up by default is probably a Latin-1 connection, so MySQL will try to convert the data even though it shouldn't
I'm getting crazy over these encoding probs...
I use json_decode and json_encode to store and retrieve data. What I did find out is, that json always needs utf-8. No problem there. I give json 'hellö' in utf-8, in my DB it looks like hellu00f6. Ok, codepoint. But when I use json_decode, it won't decode the codepoint back, so I still have hellu00f6.
Also, in php 5.2.13 it seems like there are still no optionial tags in JSON. How can I convert the codepoint caracters back to the correct specialcharacter for display in the browser?
Greetz and thanks
Maenny
It could be because of the backslash preceding the codepoint in the JSON unicode string: ö is represented \u00f6. When stored in your DB, the DBMS doesn't knows how to interpret \u00f6 so I guess it reads (and store) it as u00f6.
Are you using an escaping function ?
Try adding a backslash on unicode-escaped chars:
$json = str_replace("\\u", "\\\\u", $json);
The preceding post already explains, why your example did not work as expected.
However, there are some good coding practices when working with databases, which are important to improve the security of your application (i.e. prevent SQL-injection).
The following example intends to show some of these practices, and assumes PHP 5.2 and MySQL 5.1. (Note that all files and database entries are stored using UTF-8 encoding.)
The database used in this example is called test, and the table was created as follows:
CREATE TABLE `test`.`entries` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`data` VARCHAR( 100 ) NOT NULL
) ENGINE = InnoDB CHARACTER SET utf8 COLLATE utf8_bin
(Note that the encoding is set to utf8_bin.)
It follows the php code, which is used for both, adding new entries and creating JSON:
<?
$conn = new PDO('mysql:host=localhost;dbname=test','root','xxx');
$conn->exec("SET NAMES 'utf8'"); // Enable UTF-8 charset for db-communication ..
if(isset($_GET['add_entry'])) {
header('Content-Type: text/plain; charset=UTF-8');
// Add new DB-Entry:
$data = $conn->quote($_GET['add_entry']);
if($conn->exec('INSERT INTO `entries` (`data`) VALUES ('.$data.')')) {
$id = $conn->lastInsertId();
echo 'Created entry '.$id.': '.$_GET['add_entry'];
} else {
$info = $conn->errorInfo();
echo 'Unable to create entry: '. $info[2];
}
} else {
header('Content-Type: text/json; charset=UTF-8');
// Output DB-Entries as JSON:
$entries = array();
if($res = $conn->query('SELECT * FROM `entries`')) {
$res->setFetchMode(PDO::FETCH_ASSOC);
foreach($res as $row) {
$entries[] = $row;
}
}
echo json_encode($entries);
}
?>
Note the usage of the method $conn->quote(..) before passing data to the database. As mentioned in the preceding post, it would even be better to use prepared statements, since they already do the whole escaping. Thus, it would be better if we write:
$prepStmt = $conn->prepare('INSERT INTO `entries` (`data`) VALUES (:data)');
if($prepStmt->execute(array('data'=>$_GET['add_entry']))) {...}
instead of
$data = $conn->quote($_GET['add_entry']);
if($conn->exec('INSERT INTO `entries` (`data`) VALUES ('.$data.')')) {...}
Conclusion: Using UTF-8 for all character data stored or transmitted to the user is reasonable. It makes the development of internationalized web applications way easier. To make sure, user-input is properly sent to the database, using an escape function is a good idea. Otherwise, using prepared statements make life and development even easier and furthermore improves your applications security, since SQL-Injection is prevented.