Allowing certain ( Danish ) characters with htmlentities - php

I have a database where I need to display the records for the user. I am using htmlentities to make sure no malicious code is being echoed to the user like this:
function h($string) {
return htmlentities($string, ENT_SUBSTITUTE, "UTF-8");
}
then calling the function whenever I output any entries to the user. The problem is that I need to be able to show the Danish characters ÆØÅ and these characters displays as a question mark in a square. The site has utf-8 encoding as well.
I have tried all that is listed under htmlentities on php.net and tried finding some solution for creating exceptions or another work around, but I have been unable to find any.
Does anybody know a workaround for this issue?

The second comment answered it. Adding the charset in the connection solved the problem. So for my PDO connection I had to put it like this:
$dbh = new PDO('mysql:dbname=myName;charset=utf8;host=myHost', 'myUser', 'myPassw0rd');
Now everything displays properly.

Related

DOMDocument and UTF8. MySQL says: Incorrect string value

I am trying to load the meta description of this website (which has a German character) via the following script in PHP:
$page_content = file_get_contents($uri);
$dom_obj = new \DOMDocument();
$dom_obj->loadHTML(mb_convert_encoding($page_content, 'HTML-ENTITIES', 'UTF-8'));
However, while trying to write it into the MySQL db, Laravel says it ran into troubles trying to write that into the db: incorrect string value "\xC3" (which is the German character)
When I simply do the following, writing to the db works. But the character is not displayed correctly (ü instead of ü)
$dom_obj->loadHTML($page_content)
This problem only occurs with this website so far, others I tried with the same character do work. Can you think of a possible reason and fix? Thank you!
Edit:
It works fine, when I use PHPs "utf8_decode" to decode the meta description that I get via $dom_obj without mb_convert_encoding. When I do this, all other sites that worked before lead to errors (like this: Incorrect string value: '\xE4t')
I found the error. I was using substr to shorten the description. Apparently substr cut off one of those special characters and this is why it wasnt working.
foreach($dom_obj->getElementsByTagName('meta') as $meta) {
if($meta->getAttribute('name')=='description'){
substr($meta->getAttribute('content'), 0, 156);
This is a workaround:
mb_substr($foo,0,156,"UTF-8");

Simplifying utf8_encode

So I'm trying to find a fast way to show all my results from my database, but I can't seem to figure out why I need to add the utf8_encode() function to all of my text in order to show all my characters properly.
For the record, my database information is both French and English, so I will need special characters including à, ç, è, é, ê, î, ö, ô, ù (and more).
My form's page has the following tag:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
My database, all my tables and all my fields are set to utf8_general_ci.
When I want to echo the database information onto the page, I use this query:
public function read_information()
{
global $db;
$query = "SELECT * FROM table WHERE id='1' LIMIT 1";
return $db->select($query);
}
and return the information like so:
$info = $query->read_information();
<?php foreach ( $info as $dbinfo ) { ?>
<pre><?php echo $dbinfo->column; ?></pre>
<?php } ?>
However, if I have French characters in my string, I need to <pre><?php echo utf8_encode($info->column); ?></pre>, and this is something I really want to avoid.
I have read up the documentation on PHP.net regarding utf8_encode/utf8_decode, htmlentities/html_entity_decode and quite a few more. However, I can't seem to figure out why I need to add a special function for every database result.
I have also tried using mysqli_query("SET NAMES 'utf8'", $mysqli); but this doesn't solve my problem. I guess what I'm looking for is some kind of shortcut where I don't have to create a function like make_this_french_friendly() type of thing.
Ensure all the stack you are working with is set to UTF8 from db, web server, page meta etc
checking things like
ini_set('default_charset', 'utf-8')
should output simple stuff then in my experience
As #deceze pointed out, this thread provided proper insight using $mysqli->set_charset('utf8');.
Maybe use UTF-8 without BOM encoding for your file?
header('Content-type: text/html; charset=utf-8');
... in PHP (you can also do it with "ini_set()" function) and:
<meta charset="utf-8">
... in HTML.
You have also to set the right encoding for you database tables.
Possible duplicate of "GET" method encoding French characters incorrectly in PHP
Maybe your text coding is not be UTF-8.
Please look: What's different between UTF-8 and UTF-8 without BOM?
Maybe it can helps you.

Data in MySQL database doesn't show correctly in website

I am trying to translate a English website to Persian. problems i was facing was :
website were loading in Latin Unicode, so I had to change the charset to utf-8 so contents show correctly in Persian
data in MySQL database are not correctly shown in website probably cause of the Unicode problem
What I have done:
<?php ini_set('default_charset','utf-8'); header('Content-type: text/html; charset=utf-8'); ?>
by this , problem #1 fixed
but for problem number 2 i still facing the issue, although i have altered the tables to use utf 8 , but problem still persists. I gladly like to see how anyone can help me with this.
function bbcode ($str) {
//$str = htmlentities($str);
$token = array(
"'\[b\](.*?)\[/b\]'is",
'/\[i\](.*?)\[\/i\]/is',
'/\[u\](.*?)\[\/u\]/is',
'/\[url\=(.*?)\](.*?)\[\/url\]/is',
'/\[url\](.*?)\[\/url\]/is',
'/\[img\](.*?)\[\/img\]/is',
'/\[mail\=(.*?)\](.*?)\[\/mail\]/is',
'/\[mail\](.*?)\[\/mail\]/is',
'/\[font\=(.*?)\](.*?)\[\/font\]/is',
'/\[size\=(.*?)\](.*?)\[\/size\]/is',
'/\[color\=(.*?)\](.*?)\[\/color\]/is',
"':big_smile:'is",
"':cool:'is",
"':hmm:'is",
"':lol:'is",
"':mad:'is",
"':neutral:'is",
"':roll:'is",
"':sad:'is",
"':smile:'is",
"':tongue:'is",
"':wink:'is",
"':yikes:'is",
"':bull:'is",
'/\[item\=(.*?)\](.*?)\[\/item\]/is',
'/\[spell\=(.*?)\](.*?)\[\/spell\]/is',
"':warrior:'is",
"':paladin:'is",
"':hunter:'is",
"':rogue:'is",
"':priest:'is",
"':dk:'is",
"':shaman:'is",
"':mage:'is",
"':warlock:'is",
"':druid:'is",
"'\[ul\](.*?)\[/ul\]'is",
"'\[ol\](.*?)\[/ol\]'is",
"'\[li\](.*?)\[/li\]'is",
);
thanks alot in advance
Sorry, my reply wasn't clear enough. I was almost sleep. The databases are empty, so I don't have to convert anything, but when I am inserting data into them, the data doesn't appear correctly. BTW, I'm not good with php or mysql; I am reading these articles and suggestions for hours and I'm just getting more confused. Can you just tell me where should I enter the code and what code,
$link = mysql_connect("localhost","UserName","Password") or die(mysql_error());
mysql_set_charset("utf8",$link);
mysql_select_db("DataBase Name") or die(mysql_error());
I guess the thing I found out from these articles is to add the mysql_set_charset("utf8",$link) part to the above code while the server tries to connect to db, but I have tried that and its not working. My website uses includes so thats like this:
include("../../config/config.php");
$connect = mysql_connect("$db_host", "$db_user", "$db_pass")or die(mysql_error());
mysql_set_charset("utf8",$link);
Assuming you've correctly converted the data in your tables to UTF-8 (just changing the character set is not enough), it sounds like you might be having problems with the connection not being set up as UTF-8. Have a look at SET NAMES, and more specifically this question.
If you're not sure you've converted your data to UTF-8, I'd have a look at this question as well as this Wordpress article and make sure you've followed the steps.

PDO and special characters [duplicate]

This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 9 years ago.
I have my page encoding set to utf8 and even in the meta tag as utf8.
However, when i'm taking a value from a database it's putting a diamond with a question mark instead - im assuming doesnt know the character.
The character is a é. If i do a echo é; it displays as normal on the page. Also if i write it manually in html. However, when i grab the same value from a database call using PDO i get a �
I'm assuming its a PDO setting. I've tried:
$db->exec("SET NAMES 'utf8';");
but this doesnt resolve it.
Any suggestions?
Many things can go wrong on the way. Usually you need to have your source file encoded using utf-8, and opening the database connection using utf-8 and defining the database tables as utf-8.
A great article from #deceze that helped me clarify things is http://kunststube.net/frontback/.
The most obvious things you can try in your case are:
Save you source file with utf8 encoding. This option exists in editors like Notepad++ or Crimpson Editor.
create the PDO connection with utf8 option:
$connection = new PDO('mysql:host='.$this->host.';dbname='.$this->db_name.';charset=utf8', $this->user, $this->pass,array(PDO::ATTR_EMULATE_PREPARES => false, PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION));
make sure your table is utf-8 encoded and your form has the option :
< form action="action.php" accept-charset="utf-8">
Update: maybe utf8_encode fixed your problem, but there is a wrong conversion somewhere from the PHP to the database and back. or an wrong file encoding. You should fix the root of the problem, and utf8_encode will not be needed anymore.

Problem reading data from file special characters

My previous question and this question both are related a bit. please have a look at my previous question I did not found any other way to unserialize the data so coming with the string operation
I am able to get the whole content from file but not able to get the specific string from this content.
I want to search a specific string from these content but function stop working when the reach at first special character in the string. If I am searching something found before the special character the works properly.
String operation function of PHP not working properly when the encounter first special character in the string and stop processing immediately, Hence they does not give me the correct output.
Originally they looks like (^#)
:"Mage_Core_Model_Message_Collection":2:{s:12:"^#*^#_messages";a:0:{}s:20:"^#*^#_lastAddedMessage";N;}
but when I did echo they are display as ?
Here is the code what I tried
$file='/var/www/html/products/var/session/sess_ciktos8icvk11grtpkj3u610o3';
$contents=file_get_contents($file);
$contents=htmlspecialchars($contents);
//$contents=htmlentities($contents);
echo $contents;
$restData=strstr($contents,'"id";s:4:"');
echo $restData;
$id=substr($restData,0,strpos($restData,'"'));
echo $id;
I changed the default_charset to iso-8859-1 and also utf-8 but not working with both
Please let me know How I can resolve this.
Thanks.
These characters that you see as ^# are actually null bytes. They don't have any proper display, neither they are meant to be displayed - it's an internal representation of protected properties in the engine. You're not supposed to mess with them.
As for resolving, it'd be nice to know what kind of resolution you seek - what result are you trying to achieve?

Categories