I'm editing a site for someone, and they are using wordpress, which I really don't like, but hey, I didn't pick it. I need to change some text on their page to Portuguese characters such as Ç or Ã. I've read in a few places, that I need to change from ASCII to UTF-8, but I'm not sure where to do that, or how to do it across the whole site. Am I changing a database to UTF-8, or each individual php file? Hopefully somebody knows, thanks.
Thanks to the comments below, I have most of the site running correctly, but now I can't get the foreign characters in just certain spots, for example, anywhere I'm using code like this inside of a .php file.
$email_list = do_shortcode('[pl_modal title="Join our email list" label="<img class=\'\' title=\'Join our email list\' src=\'/wp-content/uploads/2013/02/email_icon.png\' /><br /><span>INSCREVA-SE A NOSSA<br />LISTA DE E-MAILS</span>"][gravityform id=1 title=false][/pl_modal]');
The portugese in the above code, if I add non english characters, I get a constantly loading error. More code, that does the same thing.
'<div class="graphicbuttons_cont">' .
'<a href="https://maps.google.com/maps?saddr={19}&daddr={20}" target="_blank">
<img title="Get Store Directions" src="/wp-content/uploads/2013/02/getdirection_icon.png" /><br /><span>LOCALIZACOES <br><br /> </span>
</a>' .
'</div>' .
the LOCALIZACOES in above text, should have special characters, but it won't hold them. I have changed everything to UTF8 that I can find. But there is nothing inside this specific file that says utf8, should I add something?
Alright, so, if you change everything to utf8, and on wordpress all of your html code is in php files, the way I've used to use special characters is this
thesauruslex.com/typo/eng/enghtml.htm
for example
<span>LOCALIZAÇOES </span>
will output LOCALIZAÇOES
Thanks to everyone for the help, I guess I could have been clearer on the original question.
Everything in your application needs to be UTF-8.
Your MySQL string columns should be utf8_unicode_ci.
You need to ensure that your MySQL connection charset is set to UTF-8. You can do this via the query SET NAMES utf8 (run once after every connection) or you can modify your my.cnf file if you have access to it.
Your web pages should be served with <meta charset="utf-8">
You can check and validate what kind of input you're receiving by using the PHP function mb_check_encoding.
There's also a PHP ini setting called default-charset.
This can be changed two ways depending on your theme file. In the header.php file this should be near the top:
<meta charset="<?php bloginfo('charset'); ?>">
You use to be able to change this in the wordpress backend under settings -> reading. I believe now you have to manually change this in the wp-config.php file:
define('DB_CHARSET', 'utf8');
Related
The Issue
I've been having some trouble with what I think is a UTF-8 encoding issue where posts are not being saved to my database.
The issue occurs when a user copy and pastes text from MS Word. There seems to be a particular combination of characters causing this issue (I've not found any other variations which cause the same issue yet):
% b
% B
This means that, when I var_dump() my input I get:
string(5) "70�ck"
Instead of:
string(5) "70% back"
Edit: The database error I get is:
Incorrect string value: '\xBAck an...' for column [...]
What I've tried
I'm using the Summernote JS plugin. I've tried a different plugin (WYSIHTML5) and I've tried with no plugin at all. I've tried pasting the clipboard text as plain text. I've even got an onPaste callback on the summernote which strips all the stupid encoding/styling from MS Word (which is summernote specific issue I think).
Unfortunately I've not been able to get anywhere with searching 'encoding issue "% b"' and variations thereof... but I would presume that the combination of characters above is somehow getting translated into a character that is unsupported by the database...
Database is MySQL 5.7.10 and I'm using utf8_general_ci collation on all columns.
I've set the charset to UTF-8 within CodeIgniter: $config['charset'] = 'UTF-8';
Within CodeIgniter's database config I've specified 'char_set' => 'uft8', 'dbcollat' => 'utf8_general_ci'
The page's meta tag is set to use utf-8: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
The form has the accept-charset="utf-8" attribute
Update: I've also tried the solution suggested in this question
I think I've done all the usual troubleshooting and I'm a bit stuck. Does anyone know why this specific combination of characters causes issue? Perhaps I'm wrong and it's not an encoding issue at all? Does anyone have any other ideas?
You should look into doing more on the front-end side. Try setting the encoding on the form, as most browsers should then only send UTF-8 to your server
<form ... accept-charset="UTF-8">
...
</form>
See this answer for more detail
Also, if you are using an editor, check out Quill, which allows pasting from word.
QUESTION:
Hello stackoverflow!
So this encoding stuff is getting on my last nerve. Not enough that it is difficult to figure out what the best combination of encodings needs to be when sending stuff forth and back using AJAX and PHP and SQL etc.. But it also causing problems with SESSION???!
So basically I already found a hot-fix solution no-thanks to google, partly the reason I'm writing this now. But I would also like to see if anyone of you actually have any more practical solution.
PROBLEM:
For example if I want my PHP file to have UTF-8 encoding, it then adds hidden characters in the file which then can only be viewed and deleted in a hex-editor. For those that don't know, YES any extra characters that aren't commented out will cause problems with SESSION and give you header error. So when I delete them, and re-upload the file, it falls back to ANSI encoding. Maybe there are different editors that can encode files more properly into UTF-8? I don't know, I'm using Notepad++ at the moment and am perfectly happy with it and it is hard to believe it should cause problems with encoding. I have also tried to change my default encoding in .htaccess file and no difference for the index file anyways.
It seems, although we get WARNING: session_start(): Cannot send session cache limiter - headers already sent ... the sessions are still set perfectly fine and all we could do at this point is simply turning off warning errors by placing this on top of our php file: error_reporting(~E_NOTICE & ~E_WARNING); although this doesn't really solve our problem and simply hiding it from public eye.
Page open Notepad2 or Sublime Text -> Save with Encoding -> UTF-8
index.php
<?php
session_start();
header('Content-type: text/html; charset=utf-8');
echo 'Hello ÇÖİŞÜĞüğışçö'; // bla bla
?>
SOLUTION:
I had to therefore make a fix by making 2 separate files for one and the same index, just with different encoding. Like my main file is ANSI encoded and called
index.php
that will have session_start(); line in it and beneath it we include our main scripts that were originally supposed to be there, but instead now included with this include('index_.php'); ................ Also I found out that this problem will NOT occur on all hosting servers, but only some. So the real solution may be found trough somewhere in the server settings.
I got some varchar fields in my MySQL database containing danish letters (æ, ø, å). When browsing the database with phpmyadmin, the letters appear correctly, however when I obtain the field through a query, and try to display the field, they are changed ("ø" becomes "ø"). I tried changing the collation to both latin1 and utf-8 (both danish versions), but without luck. I can't even figure out if it is the database or my code that is the issue. Anyone who has seen this before?
Edit: I'm adding the code to read and display the database content. The issue is confirmed in the "users_last_name" and "address_street", but is likely present all over (currently these are the only fields with danish letters).
Code:
<?php
// Query to load information on projects
$main_query = $this->db->query('SELECT project_id, project_name, project_image_src, project_owner FROM ed_projects');
foreach ($main_query->result() as $row) {
// Get adress of the current project in the "foreach" loop
$project_id = $row->project_id;
$address_query = $this->db->query("SELECT * FROM ed_project_address WHERE project_id='$project_id'");
$address_row = $address_query->row();
// Get the name of the user who owns the current project
$user_id = $row->project_owner;
$user_query = $this->db->query("SELECT users_first_name, users_last_name FROM ed_users WHERE id='$user_id'");
$user_row = $user_query->row();
?>
<div class="projectAvatar">
<?php if ($row->project_image_src) {
echo "<img src=".$row->project_image_src.">";
} else {
echo "NoImg";
}
?>
</div>
<div class="projectInformation">
<?php echo $row->project_name; ?> <br />
<?php echo $address_row->address_street." ".$address_row->address_number; ?> <br />
<?php echo $user_row->users_first_name." ".$user_row->users_last_name; ?> <br />
</div>
<?php
}
If things are working in phpMyAdmin, but not on your own web pages, it's likely to be a problem with the character encoding of your web pages. Assuming you're using an HTML5 doctype, try just adding:
<meta charset="utf-8">
to the <HEAD> section of your site.
Basically, from your comments and code, it seems like you're successfully storing your Danish characters using UTF-8 encoding in your database. (The table's collation setting won't affect that; collations determine sort orders and comparisons, but not the actual character set used for storage.) To make characters appear correctly on a web page, you need to tell the browser what character encoding you're using for your page; adding the <meta charset...> header does this. phpMyAdmin's web pages will almost certainly be using UTF-8 as their character set, so if that's working, you should change your pages to match it.
As I mentioned, I feel that by far the best full explanation of how this all works is given in Joel Spolsky's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!), which will tell you about how this stuff should be done in all its gory detail.
try this ANSI (Windows-1252).
read from
Differences Between Character Sets
I have a website with the content management system GetSimple which is written in PHP. I edited it as I needed, however, in the header, this is what is supposed to be there:
<title><?php get_page_clean_title(); ?> - <?php get_site_name(); ?></title>
The problem is that I am Czech and I have to use special characters (á, é, í, ó, ú, ů, ě, š etc.) and if you opened my website and saw the source code, you would see this:
<title>Tomáš Janeček - osobní web - Tom**áš** Janeček | Personal Website</title>
Instead of "Tomáš Janeček - osobní web - Tom*áš* Janeček | Personal Website".
What is bothering me are those HTML entities, which are only in the second part of the title. á stands for "á" and š stands for "š".
I know it's supposed not to hurt SEO, but I'm doing this to keep the code clear.
Is there a way to decode it or just change the get_site_name() to some better function that would have no problems with these extra characters? I don't want the entities in my code.
I think that it's not this concrete .php file that should be edited to make it as I want it to be, however, I hope it could be solved somehow simply in this file.
The CMS includes tens of .php files and I'm not sure what should I search for. I've looked for some code with PHP entities in "suspicious" files but I found nothing that helped me.
If you need it, the whole CMS can be downloaded here
Thanks for your help in advance.
Edit1:// --------------------------------------------------------------------------------------
Of course I have this meta included.
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
And no, I don't use any database. That will come with studying Joomla! :)
I want to emphasize that the title has 2 parts - get_page_clean_title() and get_site_name(), both of them include my whole name and only one displays it in the source code with HTML entities.
I have found the functions in another file:
The FIRST one is the one that doesn't put HTML entities into the source code - this is what I want from the second function lower.
function get_page_clean_title($echo=true) {
global $title;
$myVar = strip_tags(strip_decode($title));
if ($echo) {
echo $myVar;
} else {
return $myVar;
}
}
The SECOND function does what it is supposed to do, but it gives the output with HTML entities and that is the problem.
function get_site_name($echo=true) {
global $SITENAME;
$myVar = trim(stripslashes($SITENAME));
if ($echo) {
echo $myVar;
} else {
return $myVar;
}
}
Both of the functions above are in the same file.
I tried to replace the problematic function with the one working well with changing variables names to the right values, however, it stopped working at all :/
So, to conclude, the whole page is OK, there are no HTML entities except one place - the second half of the title with get_site_name function.
Furthermore, the problems is ONLY at the SOURCE CODE. The final displaying is okay.
Thanks for your replies so far, I'm glad for such fast and valuable replies. I really appreciate that.
I think you have a charset problem. If you want the special characters to display them in the right way, add
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
to your html/php file. Also check that your data is UTF-8 codified.
If you are getting your data from a MySQL database, check the columns use utf-8charset. Also set the charset for the connection with this query to ensure you are getting the data with the right codification.
set names utf8;
Tome, ensure that your *.php or database or whatever data is going off, is in UTF-8 and your meta charset on index is utf-8 also.
http://www.jakpsatweb.cz/cestina.html - Please visit this web for information about diacritics in html. You'll see the table of signs in each encoding.
How to save Russian characters in a UTF-8 encoded file
I'm getting to know Cake PHP, which has unearthed a general question about best practice in terms of PHP / MySQL character set stuff, which I'm hoping can be answered here.
My (practice) system contains a mysql table of movies. This list was sourced from an Excel sheet, which was exported as CSV, and imported via phpMyAdmin.
I noticed that titles with more "exotic" glyphs have issues rendering in the browser, eg The é in Amélie. Using Cake or plain PHP, it renders as a ?, unless transformed via htmlentities into a é. Links with the special characters don't render at all.
If I use my Cake input form to enter an <alt>0233, this is rendered correctly in source, but as é via htmlentities.
After a quick SO search, I decided maybe UTF-8 would fix stuff, hence I
changed the PHP source, and CSV file encoding to UTF-8
made sure the <meta> stuff was there (it was already via Cake's default layout).
made sure my browsers thinks the doc is UTF-8 (they do)
changed the collation on the MySQL DB to utf-8 general_ci (as an educated stab from avalable UTF-8 options)
deleted and reimported my data
However, I'm still stuck. I note that phpMyAdmin manages to render the characters "correctly" in it's HTML source when browsing records.
I sense that document encoding's to blame, however, am wondering if someone can provide the best answer to:
what's the best way to move my data from Excel to MySQL to preserve glyphs?
what's the optimum settings for my tables to accommodate this?
I'd prefer to use UTF-8 to natively display the likes of é, what can I do in Cake to avoid making loads of calls to the likes of htmlentities ie is there a configuration setting or way I set stuff up that makes this more friendly and lets Cake native helpers like Html->link work?
Some code, just in case:
movies controller excerpt..
function index() {
$this->set('movies' , $this->Movie->find('all'));
}
index.ctp view excerpt
<?php foreach ($movies as $movie): ?>
<tr>
<td><?php echo $movie['Movie']['id']; ?></td>
<td><?php echo htmlentities($movie['Movie']['title']); ?>
<td><?php echo $this->Html->link($movie['Movie']['title'] ,
array('controller' => 'movies' , 'action' => 'view' , $movie['Movie']['id'])); ?>
</td>
<td><?php echo $this->Html->link("Edit",
array('action' => 'edit' , $movie['Movie']['id'])); ?>
</td>
<td>
<?php echo $this->Html->link('Delete', array('action' => 'delete', $movie['Movie']['id']), null, 'Are you sure?')?>
</td>
</tr>
<?php endforeach; ?>
Thanks in advance for any help / tips.
Make sure the MySQL connection is set to UTF-8 while importing the data. The collation is only used for sorting and comparison, not for saving data.
You can set the charset of the connection using SET NAMES 'utf-8'; in the beginning of your SQL file.
That question comes here often.
UTF8 should work. Make sure that:
Your database collation uses utf8 (utf8 bin general)
You html document encoding tag is set to utf8
AND VERY IMPORTANT - most people forget that bit - make sure all your source files are saved as utf8. Use notepad++ on pc or Coda/TextMate/TextWrangler on mac to make sure the encoding is correct. If you don't do that, some transformation/re-interpretation of the characters may happen
EDIT: And forget about htmlentities, you don't need it if you use utf8 encoding all throughout