Simplifying utf8_encode - php

So I'm trying to find a fast way to show all my results from my database, but I can't seem to figure out why I need to add the utf8_encode() function to all of my text in order to show all my characters properly.
For the record, my database information is both French and English, so I will need special characters including à, ç, è, é, ê, î, ö, ô, ù (and more).
My form's page has the following tag:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
My database, all my tables and all my fields are set to utf8_general_ci.
When I want to echo the database information onto the page, I use this query:
public function read_information()
{
global $db;
$query = "SELECT * FROM table WHERE id='1' LIMIT 1";
return $db->select($query);
}
and return the information like so:
$info = $query->read_information();
<?php foreach ( $info as $dbinfo ) { ?>
<pre><?php echo $dbinfo->column; ?></pre>
<?php } ?>
However, if I have French characters in my string, I need to <pre><?php echo utf8_encode($info->column); ?></pre>, and this is something I really want to avoid.
I have read up the documentation on PHP.net regarding utf8_encode/utf8_decode, htmlentities/html_entity_decode and quite a few more. However, I can't seem to figure out why I need to add a special function for every database result.
I have also tried using mysqli_query("SET NAMES 'utf8'", $mysqli); but this doesn't solve my problem. I guess what I'm looking for is some kind of shortcut where I don't have to create a function like make_this_french_friendly() type of thing.

Ensure all the stack you are working with is set to UTF8 from db, web server, page meta etc
checking things like
ini_set('default_charset', 'utf-8')
should output simple stuff then in my experience

As #deceze pointed out, this thread provided proper insight using $mysqli->set_charset('utf8');.

Maybe use UTF-8 without BOM encoding for your file?
header('Content-type: text/html; charset=utf-8');
... in PHP (you can also do it with "ini_set()" function) and:
<meta charset="utf-8">
... in HTML.
You have also to set the right encoding for you database tables.
Possible duplicate of "GET" method encoding French characters incorrectly in PHP

Maybe your text coding is not be UTF-8.
Please look: What's different between UTF-8 and UTF-8 without BOM?
Maybe it can helps you.

Related

words with accents appear with strange chars in mysql

Information that I send to mysql with accents are appearing with strange chars, for example správce is admin in my language. And when I send this to mysql it appears like "správce".
Im trying to find information to solve this problem, and I saw two solutions, but any is working.
1st solution with meta tags, dont works:
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
2º solution with htmlspecialchars method also dont works
if($f['level_admin'] == '1') { $f['level_admin'] = htmlspecialchars('Správce', ENT_QUOTES, "UTF-8"); }
if($f['level_admin'] == '2') { $f['level_admin'] = htmlspecialchars('Super Správce', ENT_QUOTES, "UTF-8");}
Do you know some way that work effectively?
It's also important to know what collation is set in the MySQL DB Table - dependent on your needs you could use for example "utf8_unicode_ci" .
There is also a php function that converts string to UTF8
utf8_decode()
utf8_encode()
Normally this helps - but you better check the collation in the DB.

Getting special characters out of a MySQL database with PHP [duplicate]

This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 9 years ago.
I have a table that includes special characters such as ™.
This character can be entered and viewed using phpMyAdmin and other software, but when I use a SELECT statement in PHP to output to a browser, I get the diamond with question mark in it.
The table type is MyISAM. The encoding is UTF-8 Unicode. The collation is utf8_unicode_ci.
The first line of the html head is
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
I tried using the htmlentities() function on the string before outputting it. No luck.
I also tried adding this to php before any output (no difference):
header('Content-type: text/html; charset=utf-8');
Lastly I tried adding this right below the initial mysql connection (this resulted in additional odd characters being displayed):
$db_charset = mysql_set_charset('utf8',$db);
What have I missed?
Below code works for me.
$sql = "SELECT * FROM chartest";
mysql_set_charset("UTF8");
$rs = mysql_query($sql);
header('Content-type: text/html; charset=utf-8');
while ($row = mysql_fetch_array($rs)) {
echo $row['name'];
}
There are a couple things that might help. First, even though you're setting the charset to UTF-8 in the header, that might not be enough. I've seen the browser ignore that before. Try forcing it by adding this in the head of your html:
<meta charset='utf-8'>
Next, as mentioned here, try doing this:
mysql_query ("set character_set_client='utf8'");
mysql_query ("set character_set_results='utf8'");
mysql_query ("set collation_connection='utf8_general_ci'");
EDIT
So I've just done some reading up an playing around a bit. First let me tell you, despite what I mentioned in the comments, utf8_encode() and utf8_decode() will not help you here. It helps to actually understand UTF-8 encoding. I found the Wikipedia page on UTF-8 very helpful. Assuming the value you are getting back from the database is in fact already UTF-8 encoded and you simply dump it out right after getting it then it should be fine.
If you are doing anything with the database result (manipulating the string in any way especially) and you don't use the unicode aware functions from the PHP mbstring library then it will probably mess it up since the standard PHP string functions are not unicode aware.
Once you understand how UTF-8 encoding works you can do something cool like this:
$test = "™";
for($i = 0; $i < strlen($test); $i++) {
echo sprintf("%b ", ord($test[$i]));
}
Which dumps out something like this:
11100010 10000100 10100010
That's a properly encoded UTF-8 '™' character. If you don't have a character like that in your data retrieved from the database then something is messed up.
To check, try searching for a special character that you know is in the result using mb_strpos():
var_dump(mb_strpos($db_result, '™'));
If that returns anything other than false then the data from the database is fine, otherwise we can at least establish that it's a problem between PHP and the database.
you need to execute the following query first.
mysql_query("SET NAMES utf8");

Getting rid of HTML entities in a web title generated in PHP

I have a website with the content management system GetSimple which is written in PHP. I edited it as I needed, however, in the header, this is what is supposed to be there:
<title><?php get_page_clean_title(); ?> - <?php get_site_name(); ?></title>
The problem is that I am Czech and I have to use special characters (á, é, í, ó, ú, ů, ě, š etc.) and if you opened my website and saw the source code, you would see this:
<title>Tomáš Janeček - osobní web - Tom**áš** Janeček | Personal Website</title>
Instead of "Tomáš Janeček - osobní web - Tom*áš* Janeček | Personal Website".
What is bothering me are those HTML entities, which are only in the second part of the title. á stands for "á" and š stands for "š".
I know it's supposed not to hurt SEO, but I'm doing this to keep the code clear.
Is there a way to decode it or just change the get_site_name() to some better function that would have no problems with these extra characters? I don't want the entities in my code.
I think that it's not this concrete .php file that should be edited to make it as I want it to be, however, I hope it could be solved somehow simply in this file.
The CMS includes tens of .php files and I'm not sure what should I search for. I've looked for some code with PHP entities in "suspicious" files but I found nothing that helped me.
If you need it, the whole CMS can be downloaded here
Thanks for your help in advance.
Edit1:// --------------------------------------------------------------------------------------
Of course I have this meta included.
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
And no, I don't use any database. That will come with studying Joomla! :)
I want to emphasize that the title has 2 parts - get_page_clean_title() and get_site_name(), both of them include my whole name and only one displays it in the source code with HTML entities.
I have found the functions in another file:
The FIRST one is the one that doesn't put HTML entities into the source code - this is what I want from the second function lower.
function get_page_clean_title($echo=true) {
global $title;
$myVar = strip_tags(strip_decode($title));
if ($echo) {
echo $myVar;
} else {
return $myVar;
}
}
The SECOND function does what it is supposed to do, but it gives the output with HTML entities and that is the problem.
function get_site_name($echo=true) {
global $SITENAME;
$myVar = trim(stripslashes($SITENAME));
if ($echo) {
echo $myVar;
} else {
return $myVar;
}
}
Both of the functions above are in the same file.
I tried to replace the problematic function with the one working well with changing variables names to the right values, however, it stopped working at all :/
So, to conclude, the whole page is OK, there are no HTML entities except one place - the second half of the title with get_site_name function.
Furthermore, the problems is ONLY at the SOURCE CODE. The final displaying is okay.
Thanks for your replies so far, I'm glad for such fast and valuable replies. I really appreciate that.
I think you have a charset problem. If you want the special characters to display them in the right way, add
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
to your html/php file. Also check that your data is UTF-8 codified.
If you are getting your data from a MySQL database, check the columns use utf-8charset. Also set the charset for the connection with this query to ensure you are getting the data with the right codification.
set names utf8;
Tome, ensure that your *.php or database or whatever data is going off, is in UTF-8 and your meta charset on index is utf-8 also.
http://www.jakpsatweb.cz/cestina.html - Please visit this web for information about diacritics in html. You'll see the table of signs in each encoding.
How to save Russian characters in a UTF-8 encoded file

Hebrew fonts is not showing

I have an API in which i am showing some videos of english and hebrew. Its all running fine except the title written in Hebrew language is not showing instead of it its returning sign of "??"
Any idea ??
Make sure you are serving UTF-8 by adding <meta charset="UTF-8" /> in between your head tag.
Make sure the database content is UTF-8. If it's not UTF-8, you'll need to convert the database data from the encoding used in the database to UTF-8 before you push it to the web browsing client like this: $str = mb_convert_encoding($str, "UTF-8", "DATABASE-ENCODING-USED"); More info and examples: http://php.net/manual/en/function.mb-convert-encoding.php
One of both will be the solution to your the problem.
Update
Might be far-fetched, but make sure your browser is able to display a hebrew font type too.
In fact, you don't tell us how you check your database (with a client or via browser) and we don't know how you test the display.
To avoid any "hidden" errors in your debugging efforts, try this code:
<?php
/*
Make sure the web browser receives a header telling it there's UTF-8 inside.
*/
header('Content-Type: text/html; charset=UTF-8');
/*
Write a title that will display correctly
if there's no font-related issue in your browser.
*/
echo('<h1>זהו מבחן.</h1>');
/*
Now connect to database,
and use a simple SELECT to fetch something from the database.
Replace YOUR-MYSQL-USERNAME, YOUR-MYSQL-PASSWORD and YOUR-MYSQL-DATABASE
with your individual values...
*/
$dbh = mysql_connect("localhost","YOUR-MYSQL-USERNAME","YOUR-MYSQL-PASSWORD");
if(!$dbh)
{
exit('Could not connect: ' . mysql_error());
}
$result = mysql_query('SELECT * FROM YOUR-MYSQL-DATABASE LIMIT 1',$dbh);
mysql_close($dbh);
/*
Now dump it nicely to the screen and exit.
*/
echo('<pre>');
var_dump($result);
echo('</pre>');
exit();
?>
That will help you nailing down the problem.
If it does not show your Hebrew fonts nicely, it's most probably a font-and-browser issue.
In any other case, your database is not UTF-8 and you will need to convert using *mb_convert_encoding* or alike - as I explained above.
Have you encoded your HTML document to accept them?
<meta http-equiv="Content-type" content="text/html;charset=UTF-8" />
<html lang="he">
...
</html>
If the data is coming from the database, you want to make sure that you're retrieving it with the right encoding. If possible, you should run a SET NAMES query before the query (either SET NAMES 'utf8' or SET NAMES 'hebrew', depending on the collation in the DB).

Change the characters in mysql with Convert failing - Still getting Não

I am populating this mysql table with data from a php (via post and using filter_input).
The database is utf8 but when I have a user that inputs words with ^,',',~ like Não I get this -> Não
What do I have to do to make it show the correct values. Or should I try to make some correction when I retrieve the data??
UPDATE:
I have added a utf8_decode and now it is inserting ok.
Anyone know how to convert the string that were already in the table?? I tried using the convert function but I can't make it work :(
UPDATE:
I am trying this code:
select convert(field using latin1)
from table where id = 35;
And I am still getting this: Não
I tried other encoding s but I never get the word Não
Anyone have any thoughts on this one??
First, make sure your page is utf-8
<meta http-equiv="Content-type" content="text/html; charset=UTF-8"/>
next, if your on Apache, make sur your in UTF-8 in config file :
AddDefaultCharset UTF-8
or your can do it in a .php file like this :
header('Content-type: text/html; charset=UTF-8');
if you still have problem, you can use the encode function :
$value = utf8_encode($value);
Hope all this will help...
It looks like somewhere along the way something cannot handle Unicode. As a result, ã is getting interpreted as two separate characters. Make sure everything that handles strings is OK with Unicode.

Categories