PHP & MySQL special character encoding problem - php

When a user submits a special character ♠ it's stored in MySQL database as â� and if a user wants to change it instead of displaying it back as ♠ its displayed as â� how can I fix this problem so that its dsiplayed back as ♠ and saved as ♠?
On a side note how should I save my special characters using PHP?
I'm using PHP & MySQL

User types in data
You escape that data to avoid SQL injection (don't convert the special characters to html code equivalent yet)
Data gets stored in the database exactly how user typed it in
You pull the raw data back out
You run the raw data through a character encoding function or something equivalent to convert special characters to their html codes thus avoiding cross site scripting or html injection

That's could be a problem.
If you want to convert your special characters into entities, you have to htmlencode them twice when outputting into field value/textarea content. But it could mess with other characters - all become their entity representations - quotes, brackets and such. If it's what you're asked for - go ahead. But, in my opinion, it could be a terrible mess to edit such a text.
That's why it's better not to let users to use entities. Why can't they enter the symbol itself?
As for the special characters in your database - just use UTF-8 encoding in both database and HTML.

Related

Covert special character to html number/name and again to special character in php

I am facing a problem with storing special characters in database and retrieving again as symbol.
For example, I have a string like Côte d'Ivoire
What I want to do is converting the special character ô to HTML number ô or name ô and at the time of retrieval I need to convert HTML to special symbol again.
I also need to pass this string as JSON response of a web service.
I tried some php functions like htmlspecialchars() and htmlspecialchars_decode() but not getting the desired output.
Any help will be appreciated. If there is any other way to do it then it will also be very helpful.
Thanks in advance
You can use the htmlentities function to transform the special characters.
You have to pass UTF8 to the json_encode function, so you can use utf8_encode on your data before encoding.
http://php.net/manual/en/function.htmlentities.php
http://php.net/manual/en/function.utf8-encode.php
use 'utf8_unicode_ci' for Collation while saving data on database and retrieve data usual way.and check exact data is saving on database.
This problem is much easier to solve, when you use UTF-8 for the whole site including your database. Escaping should be done as late as possible and only for the needed target system.
An example:
Your HTML page is UTF-8 encoded and you receive user input, you get the user input also in UTF-8. This value you can store as it is to the database, just use prepared statements or call mysqli_real_escape_string() before building the SQL-string. This escapes the input just to make it safe for SQL-statements, the database will contain the original user input.
When receiving the value back from the database you get the original UTF-8 input, then you can call htmlspecialchars() to escape it for displaying in HTML output. I wrote a small article about using UTF-8 for the whole site there you can find more information.

real_escape_string not cleaning up entered text

I thought the proper way to "sanitize" incoming data from an HTML form before entering it into a mySQL database was to use real_escape_string on it in the PHP script, like this:
$newsStoryHeadline = $_POST['newsStoryHeadline'];
$newsStoryHeadline = $mysqli->real_escape_string($newsStoryHeadline);
$storyDate = $_POST['storyDate'];
$storyDate = $mysqli->real_escape_string($storyDate);
$storySource = $_POST['storySource'];
$storySource = $mysqli->real_escape_string($storySource);
// etc.
And once that's done you could just insert the data to the DB like this:
$mysqli->query("INSERT INTO NewsStoriesTable (Headline, Date, DateAdded, Source, StoryCopy) VALUES ('".$newsStoryHeadline."', '".$storyDate."', '".$dateAdded."', '".$storySource."', '".$storyText."')");
So I thought doing this would take care of cleaning up all the invisible "junk" characters that may be coming in with your submitted text.
However, I just pasted some text I copied from a web-page into my HTML form, clicked "submit" - which ran the above script and inserted that text into my DB - but when I read that text back from the DB, I discovered that this piece of text did still have junk characters in it, such as –.
And those junk characters of course caused the PHP script I wrote that retrieves the information from the DB to crash.
So what am I doing wrong?
Is using real_escape_string not the way to go here? Or should I be using it in conjunction with something else?
OR, is there something I should be doing (like more escaping) when reading reading data back out from the the mySQL database?
(I should mention that I'm an Objective-C developer, not a PHP/mySQL developer, but I've unfortunately been given this task to do some DB stuff - hence my question...)
thanks!
Your assumption is wrong. mysqli_real_escape_string’s only intention is to escape certain characters so that the resulting string can be safely used in a MySQL string literal. That’s it, nothing more, nothing less.
The result should be that exactly the passed data is retained, including ‘junk’. If you don’t want that ‘junk’ in your database, you need to detect, validate, or filter it before passing to to MySQL.
In your case, the ‘junk’ seems to be due to different character encodings: You input data seems to be encoded with UTF-8 while it’s later displayed using Windows-1250. In this scenario, the character – (U+2013) would be encoded with 0xE28093 in UTF-8 which would represent the three characters â, €, and “ in Windows-1250. Properly declaring the document’s encoding would probably fix this.
Sanitization is a tricky subject, because it never means the same thing depending on the context. :)
real_escape_string just makes sure your data can be included in a request (inside quotes, of course) without having the possibility to change the "meaning" of the request.
The manual page explains what the function really does: it escapes nul characters, line feeds, carriage returns, simple quotes, double quotes, and "Control-Z" (probably the SUBSTITUTE character). So it just inserts a backslash before those characters.
That's it. It "sanitizes" the string so it can be passed unchanged in a request. But it doesn't sanitize it under any other point of view: users can still pass for instance HTML markers, or "strange" characters. You need to make rules depending on what your output format is (most of the time HTML, but HTTP isn't restricted to HTML documents), and what you want to let your users do.
If your code can't handle some characters, or if they have a special meaning in the output format, or if they cause your output to appear "corrupted" in some way, you need to escape or remove them yourself.
You will probably be interested in htmlspecialchars. Control characters generally aren't a problem with HTML. If your output encoding is the same as your input encoding, they won't be displayed and thus won't be an issue for your users (well, maybe for the W3C validator). If you think it is, make your own function to check and remove them.

How to prevent ) from being inserted in a Database in PHP?

I am working on a PHP/MySQL script that is inserting data into a database like this...
Caesar (courtesy post)
I know this is a basic question but how can I prevent the special characters from doing that?
It seems you're not just HTML-escaping your content once, but actually doing it twice. The first thing you should do is try to find out why your content ends up that way, instead of attempting to decode it to an unescaped format. You should always escape for the format you're going to use the data in, escape with the SQL escape functions when inserting, and escape with htmlspecialchars (or a similar function) when presenting the data in HTML (and take note of the character encoding used).
If the data comes in this format from another source, use html_entity_decode to normalize the text again. That does however seem weird.

mysql_real_escape_string fails to escape special chars as text comes from text editor

I m creating page in which user enters commnets and that comments are inserted into DB(mysql). These comments can contain single,double quotes or any special chars. To escape these I used following code
$str = mysql_real_escape_string($str,$conn);
here $conn is active connection resource, $str is string content from textarea
This works fine and return perfectly escaped string that I can insert into DB. But if user typed his/her comments into text editor like openoffice writer or msword and use this text from it, the error occur and gives error as follow while inserting in DB
Incorrect string value: '\x93testi...' for column 'commnets' at row 1
I think this is happening because single-double quotes in text that are coming from text editor(openoffice, msword) is not escaped properly. So How do I escape it to insert it into DB. Please help me
Thanks in advance.....
You aren't submitting a valid UTF8 string to be saved in the DB. Instead it's probably a windows specific character set.
Presumably your users are submitting the text through a web page - you need to make sure that you serve the page in UTF8 and when the form is submitted it is also in UTf8 (which it will be by default if the page is served in UTF8).
You need to:
Make sure you're sending the UTF-8 charset in the headers.
header("Content-Type:text/html; charset=UTF-8");
And/or set the content type in your section of your page
btw mysql_real_escape_string is not really anything to do with the problem here. That function is used to prevent strings containing normal quotes from being used to do SQL injection attacks, which is better solved by using prepared statements anyway.
There is one way to sidestep all this real_escape malarkey and inject INTO sql what is actually supplied, and that is to use mysql's ability to interpret a hexadecimal number of arbitrary length as a string.
e.g.
$query=sprintf("update module set code=0x%s where id='%d'", bin2hex($code), $id);
This works even if code is a BLOB type binary field and $code is full binary data (e.g, an image file contents).
You will also sidestep any sql injection with this.
I have found that using sprintf to format queries is extremely powerful and safe and use of the php bin2hex() renders anything up to and including binary able to get into the database untainted.
Getting it out is somewhat another matter mind you..

HTML Purifier selectively eating special characters

Using PHP against a UTF-8 compliant database. Here's how input goes in.
user types input into textarea
textarea encoded with javascript escape()
passed via HTTP post
decoded with PHP rawurldecode()
passed through HTMLPurifier with default settings
escaped for MySQL and stored in database
And it comes out in the usual way and I run unescape() on page load. This is to allow people to, say, copy and paste directly from a word document and have the smart quotes show up.
But HTMLPurifier seems to be clobbering non-UTF-8 special characters, ones that escape() to a simple % expression, like Ö, which escapes to %D6, whereas smartquotes escape to %u2024 or something and go into the database that way. It takes out both the special character and the one immediately following.
I need to change something in this process. Perhaps I need to change multiple things.
What can I do to not get special characters clobbered?
textarea encoded with javascript escape()
escape isn't safe for non-ascii. Use escapeURIComponent
passed via HTTP post
I assume that you use XmlHttpRequest? If not, make sure that the page containing the form is served as utf-8.
decoded with PHP rawurldecode()
If you access the value through $_POST, you should not decode it, since that has already been done. Doing so will mess up data.
escaped for MySQL and stored in database
Make sure you don't have magic quotes turned on. Make sure that the database stores tables as utf-8 (The encoding and the collation must be both utf-8). Make sure that the connection between php and MySql is utf-8 (Use set names utf8, if you don't use PDO).
Finally, make sure that the page is served as utf-8 when you output the string again.

Categories