Store and output text with accents - php

I have some text in a database. I use French and English. French has accents, and some special characters like ç. I use Mamp, MySQL and PHP.
I have collation latin1_swedish-ci (the default). I tried utf8_general_ci and the result is the same.
If I use in a html page, I have this in the head: <meta charset="UTF-8">
As an example, in the database I have "voilà".
When I echo the text from the database to html:
$con = mysqli_connect("localhost","root","root");
if (!$con) {
die('The connexion failed: ' . mysqli_error());
}
if (!mysqli_select_db($con, 'prova')){
echo "Connection with database was not possible";
}
$result = mysqli_query($con, "SELECT * FROM test1
WHERE id='1' ")
or die(mysqli_error());
while($row = mysqli_fetch_array($result)) {
$text = $row['first'];
echo $text; //I see: voil�
echo htmlentities($text); //I see nothing
echo utf8_encode($text); //This works: I see voilà
}
Why htmlentities does not work?
Is utf8_encode(); the way to go? I have to use that always when I output something from the database? Why do I have to use that if the collation is already UTF8? Is there any better way to store and output text with accents in a MySQL database?

After you connect to the DB you should set the client charset to UTF8:
mysqli_set_charset($con, "UTF8");
Otherwise the mysql client transforms the UTF8 'voilà' to latin1 ('cause it seems that is it's default).
Either you tell the client that I want everything in UTF8, or you get it with the default latin1, and convert it one-by-one yourself calling utf8_encose($text)

Related

Wrong character encoding

I have two forms on two different pages which are used to insert data to an MySQL database. I have some special character like 'čšžćđ' in my form data which I pass via the forms to the insertion scripts.
The data from the first form gets inserted correctly, while some fields from the second form contain the '?' characters, which would indicate a mismatch in encoding.
The two insertion scripts of both the forms are using the same file to connect to the database and set the encoding, like below:
<?php
$username = "root";
$password = "";
$servername = "localhost";
$conn = mysqli_connect($servername, $username, $password);
mysqli_select_db($conn, "testdb");
if (!$conn) { // check if connected
die("Connection failed: " . mysqli_connect_error());
exit();
}else{
/* change character set to utf8 */
if (!mysqli_set_charset($conn, "utf8")) {
// printf("Error loading character set utf8: %s\n", mysqli_error($conn));
} else {
// printf("Current character set: %s\n", mysqli_character_set_name($conn));
}
mysqli_select_db($conn, "testdb");
//echo "Connected successfully.";
// Check if the correct db is selected
if ($result = mysqli_query($conn, "SELECT DATABASE()")) {
$row = mysqli_fetch_row($result);
//printf("\nDefault database is %s.\n", $row[0]);
mysqli_free_result($result);
}
}
?>
I guess this would mean, that the client character encoding isn't set correctly? All database tables have the utf_8 encoding set.
Try to set encoding on top of the page
<?php
header('Content-Type: text/html; charset=utf-8');
other code...
Are you talking about HTML forms? If so,
<form accept-charset="UTF-8">
Is it one ? per accented character? When trying to use utf8/utf8mb4, if you see Question Marks (regular ones, not black diamonds),
The bytes to be stored are not encoded as utf8. Fix this.
The column in the database is CHARACTER SET utf8 (or utf8mb4). Fix this.
Also, check that the connection during reading is utf8.
The data was probably converted to ?, hence cannot be recovered from the text.
SELECT col, HEX(col) FROM ... to see what got stored.
? is 3F in hex.
Accented European letters will have two hex bytes per character. That includes each of čšžćđ.
Chinese, Japanese, and Korean will (mostly) have three hex bytes per character.
Four hex characters would indicate "double encoding".

Php mysql returning?

I have a problem, when I try to echo a cyrillic character, it return like ????
Here's code
<?
include('db.php');
$sql = "SELECT * FROM menu_items WHERE reference=1";
$result = $conn->query($sql);
if ($result->num_rows > 0) {
$rows = array();
while($row = $result->fetch_object()) {
$rows[] = json_encode($row);
}
$items = implode(',',$rows);
echo '['.$items.']';
}else {
echo "ERROR";
}
?>
Any idea?
Collation : utf8_general_ci
And db.php:
<?
$servername = "localhost";
$username = "test";
$password = "Conqwe333!";
$conn=mysqli_connect($servername,$username,$password,"test");
// Check connection
if (mysqli_connect_errno())
{
echo "Failed to connect to MySQL: " . mysqli_connect_error();
}
?>
Worked after <? $conn->set_charset("utf8");?>
Add before your $sql
$conn->query('SET NAMES utf8');
You can read more about it here
Also you will need to set proper header for browser. You can do it by serveral ways for example in meta html tag or using header('Content-Type: text/html; charset=utf-8');
You should set collation per connection:
mysqli_set_charset
Also you can perform sql
SET NAMES utf8;
but it's not recommended
<?php
$mysqli = new mysqli("localhost", "my_user", "my_password", "test");
/* check connection */
if (mysqli_connect_errno()) {
printf("Connect failed: %s\n", mysqli_connect_error());
exit();
}
/* change character set to utf8 */
if (!$mysqli->set_charset("utf8")) {
printf("Error loading character set utf8: %s\n", $mysqli->error);
} else {
printf("Current character set: %s\n", $mysqli->character_set_name());
}
$mysqli->close();
I am assuming you are using Bulgarian and UTF8, same will work for Russian and other languages, just change "bg" to proper string.
I do not recommend you to use cp1251, because it breaks unexpectedly with apache mod_rewrite and other tools like this.
You need to do following checks:
Check if your database / table collation is some UTF8. It could be utf8_general_ci or Bulgarian - difference is minimal and is more sorting related. (utf8_general_ci is perfectly OK)
Check you have following statement executed right after connect - set names UTF8;. You can do $mysqli->query("set names utf8");
Make sure you have proper "tags". Here an example:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html lang='bg' xml:lang='bg' xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Нов сайт :)</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf8">
You can include UTF8 "BOM" on the html, but it works pretty well without it. I usually work without "BOM", and when I want to be 100% complaint, I create an include file bom.php that contain just the BOM symbol and include it prior HTML template in normal PHP way, e.g. include "bom.php".
Hope this helps, if not, please comment.
EDIT:
Someone suggested you must be sure if your data is properly stored in MySQL. Easiest way is to open PHP MySQL Admin. If Cyrillic is shown there, all is OK.
I think the issue is a step back, try to first encode the cyrillic characters correctly: How to encode cyrillic in mysql?

UTF-8 encoding on mysql queries - Danish characters

I have tried for a while now to set the right encoding to show danish characters on my MySQL query. I haven't found exactly a similar situation.
My output shows a question mark instead of the appropriate characters. This is my connect file.
<?php
$con=mysql_connect("localhost","root","");
if (!$con)
{
die('Could not connect: ' . mysql_error());
}
?>
And here is my display file:
<?php
include("connect.php");
mysql_select_db("paradise",$con);
$result=mysql_query("SELECT * FROM CITATER4 ORDER BY RAND() LIMIT 1",$con);
while($data = mysql_fetch_row($result))
{
echo "<aside class=\"citatout\">";
echo "<div id=\"paradiso\" class=\"text-vertical-center-q\">";
echo utf8_encode("<h1 class=\"animated fadeIn\" align=center>$data[0]</h1>");
echo utf8_encode("<h2 class=\"animated fadeIn\" align=center>$data[1]</h2>");
echo "</div>";
echo "</aside>";
}
?>
I tried to set the encoding using this code but it still didn't change. I found this in another question here on Stackoverflow.
mysql_set_charset("utf8", $con);
I encoded the strings in the displayed file with utf8_encode and it still doesn't work.
Do you have a solution?
You should not be needing to use utf8_encode.
Are your database tables utf8_danish_ci?
Try running this mysql query in e.g. phpmyadmin.
ALTER TABLE CITATER4 CONVERT TO CHARACTER SET utf8 COLLATE utf8_danish_ci;
Does HTML5 have <meta charset="utf-8"> in the head tag?

PHP Charset for special characters

I am creating my website with HTML and PHP with a post function. In the post, I am using special special characters (Å, Á ...), but they appear as � on the screen. However, all of the HTML content works.
Any idea?
<META http-equiv="Content-Type" content="text/html; charset=utf-8">
<?php
getPosts();
?>
And the Functions file:
<?php
include('connect.php');
function getPosts() {
$query = mysql_query("SELECT * FROM posts") or die(mysql_error());
while($post = mysql_fetch_assoc($query)) {
echo "<h2>" . $post['Title'] . " by " . $post['Author'] . "</h2>";
echo $post['Content'];
}
}
?>
Make sure your MySQL character set and collation (at least for this database/table/column) is utf8.
Also make sure that you set the connection charset correctly:
mysql_set_charset ( "utf8" );
This requires PHP 5.2.3 and MySQL 5.0.7. Also consider switching to MySQLi or PDO which usually handles this better. The obsolete mysql_* API has been deprecated in PHP 5.5
chances are your mySQL database table is not in the correct collation.
ALTER TABLE `posts` CHANGE `content` `content` TEXT CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL ;
do a change like this for your entire database, table, or cell.

Problems in inserting utf-8 string into database and then outputting it to web page

I am learning PHP programming, so I have setup testing database and try to do various things with it. So situation is like that:
Database collation is utf8_general_ci.
There is table "books" created by query
create table books
( isbn char(13) not null primary key,
author char(50),
title char(100),
price float(4,2)
);
Then it is filled with some sample data - note that text entries are in russian. This query is saved as utf-8 without BOM .sql and executed.
insert into books values
("5-8459-0046-8", "Майкл Морган", "Java 2. Руководство разработчика", 34.99),
("5-8459-1082-X", "Кристофер Негус", "Linux. Библия пользователя", 24.99),
("5-8459-1134-6", "Марина Смолина", "CorelDRAW X3. Самоучитель", 24.99),
("5-8459-0426-9", "Родерик Смит", "Сетевые средства Linux", 49.99);
When I review contents of created table via phpMyAdmin, I get correct results.
When I retrieve data from this table and try to display it via php, I get question marks instead of russian symbols. Here is piece of my php code:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Books</title>
</head>
<body>
<?php
header("Content-type: text/html; charset=utf-8");
mysqli_set_charset('utf8');
# $db = new mysqli('localhost', 'login', 'password', 'database');
$query = "select * from books where ".$searchtype." like '%".$searchterm."%'";
$result = $db->query($query);
$num_results = $result->num_rows;
for ($i = 0; $i < $num_results; $i++) {
$row = $result->fetch_assoc();
echo "<p><strong>".($i+1).". Title: ";
echo htmlspecialchars (stripslashes($row['title']));
echo "</strong><br />Author: ";
echo stripslashes($row['author']);
echo "<br />ISBN: ";
echo stripslashes($row['isbn']);
echo "<br />Price: ";
echo stripslashes($row['price']);
echo "</p>";
}
...
And here is the output:
1. Название: Java 2. ??????????? ????????????
Автор: ????? ??????
ISBN: 5-8459-0046-8
Цена: 34.99
Can someone point out what I am doing wrong?
Can someone point out what I am doing wrong?
Yes, I can.
You didn't tell Mysql server, what data encoding you want.
Mysql can supply any encoding in case your page encoding is different from stored data encoding. And recode it on the fly.
Thus, it needs to be told of client's preferred encoding (your PHP code being that database client).
By default it's latin1. Thus, because there is no such symbols in the latin1 character table, question marks being returned instead.
There are 2 ways to tell mysql what encoding we want:
a slightly more preferred one is mysqli_set_charset() function (method in your case).
less preferred one is SET NAMES query.
But as long as you are using mysqli extension properly, doesn't really matter. (though you aren't)
Note that in mysql this encoding is called utf8, without dashes or spaces.
Try to set output charset:
SET NAMES 'utf-8'
SET CHARACTER SET utf-8
Create .htaccess file:
AddDefaultCharset utf-8
AddCharset utf-8 *
CharsetSourceEnc utf-8
CharsetDefault utf-8
Save files in UTF-8 without BOM.
Set charset in html head.
After your mysql_connect, set your connection to UTF-8 :
mysql_query("SET NAMES utf8");
Follow Alexander advices for .htaccess, header and files encoding
You probably need to call mysqli_set_charset('utf8'); after you set up your connection with new mysqli(...) as it works on a link rather than a global setting.
so..
# $db = new mysqli('localhost', 'login', 'password', 'database');
mysqli_set_charset($db, 'utf8');
$query = "select * from books where ".$searchtype." like '%".$searchterm."%'";
By the way, that query seems to be open to SQL-injection unless $searchterm is sanitized. Just something to keep in mind, consider using prepared statements.
And using # to suppress errors is generally not recommended, especially not during development. Better to deal with error-conditions.
after your mysql_query add
#mysql_query("SET character_set_server='utf8'; ");
#mysql_query("SET character_set_client='utf8'; ");
#mysql_query("SET character_set_results='utf8'; ");
#mysql_query("SET character_set_connection='utf8'; ");
#mysql_query("SET character_set_database='utf8'; ");
#mysql_query("SET collation_connection='utf8_general_ci'; ");
#mysql_query("SET collation_database='utf8_general_ci'; ");
#mysql_query("SET collation_server='utf8_general_ci'; ");
Try to put also in the HTML document Head the meta tag:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
this is different to the HTTP header header("Content-type: text/html; charset=utf-8");

Categories