I am working on a PHP/MySQL script that is inserting data into a database like this...
Caesar (courtesy post)
I know this is a basic question but how can I prevent the special characters from doing that?
It seems you're not just HTML-escaping your content once, but actually doing it twice. The first thing you should do is try to find out why your content ends up that way, instead of attempting to decode it to an unescaped format. You should always escape for the format you're going to use the data in, escape with the SQL escape functions when inserting, and escape with htmlspecialchars (or a similar function) when presenting the data in HTML (and take note of the character encoding used).
If the data comes in this format from another source, use html_entity_decode to normalize the text again. That does however seem weird.
Related
I thought the proper way to "sanitize" incoming data from an HTML form before entering it into a mySQL database was to use real_escape_string on it in the PHP script, like this:
$newsStoryHeadline = $_POST['newsStoryHeadline'];
$newsStoryHeadline = $mysqli->real_escape_string($newsStoryHeadline);
$storyDate = $_POST['storyDate'];
$storyDate = $mysqli->real_escape_string($storyDate);
$storySource = $_POST['storySource'];
$storySource = $mysqli->real_escape_string($storySource);
// etc.
And once that's done you could just insert the data to the DB like this:
$mysqli->query("INSERT INTO NewsStoriesTable (Headline, Date, DateAdded, Source, StoryCopy) VALUES ('".$newsStoryHeadline."', '".$storyDate."', '".$dateAdded."', '".$storySource."', '".$storyText."')");
So I thought doing this would take care of cleaning up all the invisible "junk" characters that may be coming in with your submitted text.
However, I just pasted some text I copied from a web-page into my HTML form, clicked "submit" - which ran the above script and inserted that text into my DB - but when I read that text back from the DB, I discovered that this piece of text did still have junk characters in it, such as –.
And those junk characters of course caused the PHP script I wrote that retrieves the information from the DB to crash.
So what am I doing wrong?
Is using real_escape_string not the way to go here? Or should I be using it in conjunction with something else?
OR, is there something I should be doing (like more escaping) when reading reading data back out from the the mySQL database?
(I should mention that I'm an Objective-C developer, not a PHP/mySQL developer, but I've unfortunately been given this task to do some DB stuff - hence my question...)
thanks!
Your assumption is wrong. mysqli_real_escape_string’s only intention is to escape certain characters so that the resulting string can be safely used in a MySQL string literal. That’s it, nothing more, nothing less.
The result should be that exactly the passed data is retained, including ‘junk’. If you don’t want that ‘junk’ in your database, you need to detect, validate, or filter it before passing to to MySQL.
In your case, the ‘junk’ seems to be due to different character encodings: You input data seems to be encoded with UTF-8 while it’s later displayed using Windows-1250. In this scenario, the character – (U+2013) would be encoded with 0xE28093 in UTF-8 which would represent the three characters â, €, and “ in Windows-1250. Properly declaring the document’s encoding would probably fix this.
Sanitization is a tricky subject, because it never means the same thing depending on the context. :)
real_escape_string just makes sure your data can be included in a request (inside quotes, of course) without having the possibility to change the "meaning" of the request.
The manual page explains what the function really does: it escapes nul characters, line feeds, carriage returns, simple quotes, double quotes, and "Control-Z" (probably the SUBSTITUTE character). So it just inserts a backslash before those characters.
That's it. It "sanitizes" the string so it can be passed unchanged in a request. But it doesn't sanitize it under any other point of view: users can still pass for instance HTML markers, or "strange" characters. You need to make rules depending on what your output format is (most of the time HTML, but HTTP isn't restricted to HTML documents), and what you want to let your users do.
If your code can't handle some characters, or if they have a special meaning in the output format, or if they cause your output to appear "corrupted" in some way, you need to escape or remove them yourself.
You will probably be interested in htmlspecialchars. Control characters generally aren't a problem with HTML. If your output encoding is the same as your input encoding, they won't be displayed and thus won't be an issue for your users (well, maybe for the W3C validator). If you think it is, make your own function to check and remove them.
I'm storing data in a MySQL database that may have some special characters. I'm wondering how to store it so that these characters are preserved if they're either output to HTML via PHP OR via JavaScript, e.g. createTextNode.
For example, the division symbol (÷) has the html code ÷, and when I store it as that it shows up fine when put directly into HTML by PHP, but when I pull it into JavaScript using $.getJSON and then insert it with createTextNode it shows up looking like ÷.
I also tried storing the symbol in the SQL directly, but my understanding is that the column would need to be changed from VARCHAR to NVARCHAR and that would cause a performance hit that doesn't seem necessary.
Given that I can modify the SQL, the PHP, or the JavaScript, is there an easy fix here? Maybe a way to unescape the HTML entity in JavaScript?
As answered by Yogesh, you should switch your collation of the DB to utf8_general_ci
So there's probably two things going on:
JSON escapes special characters.
Somewhere, something in your code flow is URL encoding the strings too.
So you just need to decode the string in your JavaScript, or you need to find what part of your code is URL encoding those strings and fix it.
What's the best route for storing data in MySQL. With MySQL should I just use, TEXT as my field type?
As well when using mysql_real_escape_string() with return'ed values \r\n .
But should I be running the htmlentities() on it after that?
And then when I return data to the screen I should use, NL2BR()?
Just trying to figure out the best route here for storing this information.
Thank you for your help!
TEXT or TINYTEXT or anything similar should be fine for storing ASCII data from the user. If you don't need a lot of space you may think about VARCHAR
i think that mysql_real_escape_string() escapes characters that may compromise the security of an SQL query (single quote, double quote, etc.) but doesn't do much more than that.
htmlentities() converts reserved html characters like < and > into their html encoded equivalent, < and > respectively. These characters are not dangerous for SQL queries so you probably do not need to escape them unless you want to display the HTML tag entered by the user as text, and not let it be interpreted as HTML.
NL2BR() is probably not necessary either.
Most importantly, your decision on when to use each of these functions will depend on your end application. You may need / want some but not others ( though you should definitely use mysql_real_escape_string() )
Really depends on what you are trying to store. For things such as usernames, passwords, etc... then you can use varchar. But if your storing long text such as news posts or html data, then you can use TEXT or LONG TEXT (Depending on how long it is).
You should ALWAYS use mysql_real_escape_string() when inserting into the DB. If you're outputting HTML from the DB, you may wan to run htmlentities or html_specialchars to ensure that you aren't outputting user injected javascript that could redirect your users to hacker websites and such.
One other idea is that you could escape your data using htmlentities before inserting into the DB, but it's your choice.
NL2BR is great for forcing all \r\n to tags instead.
So, it seems like your on the right track...
When encoding newline of textarea before storing into mysql using PHP with rawurlencode function encodes newline as %0D%0A.
For Example:
textarea text entered by user:
a
b
encoding using rawurlencode and store into database will store value as a%0D%0Ab
When retrieving from database and decoding using rawurldecode does not work and code gives error. How to overcome this situation and what is the best way to store and retrieve and display textarea values.
can you first encode this textarea string using base64_encode and then perform a base64_decode on the same, if the above does not work for you.
If the textarea does not contain URLs, you should rather use base64_encode then rawurlencode and then store as normal.
You simply should not use rawurlencode for escaping data for your database.
Each target format has it's own escaping method which in general terms makes sure it is stored/display/transferred safely from one place to another, and it doesn't need decoding at the other end.
For instance:
displaying text in HTML, use htmlentities or htmlspecialchars
storing in database, use mysqli_real_escape_string, pg_escape_string, etc...
transferring variablename, use urlencode
transferring variablecontent, use rawurlencode
etc...
You should notice that decoding these things is often done by the browser/database. So no data is actually stored escaped. And decoding doesn't need te be done by your code.
The problem is probably because you escape a sequence with rawurlencode, but your database expected the escaped format for the specific brand of database. And de-escaped it using that assumption, which was wrong, which messed up your string.
Conclusion: find out what brand database you are using, look up the specific escape function for that database, and use the proper escaping function on all your content "transferral".
P.S.: some definition may not be correct, please comment on that. I wanted to make the idea stick but am probably not using all the right terms.
First of all it is very uncommon to run textarea through urlencode()
urlencode was not designed for this purpose.
Second, if you still want to do this, then maybe the problem comes from database. First you need to tell us what database you using and what TYPE you using for storing this data: do you store it as TEXT or as BINARY data? Have you setup the correct charset in database?
When a user submits a special character ♠ it's stored in MySQL database as â� and if a user wants to change it instead of displaying it back as ♠ its displayed as â� how can I fix this problem so that its dsiplayed back as ♠ and saved as ♠?
On a side note how should I save my special characters using PHP?
I'm using PHP & MySQL
User types in data
You escape that data to avoid SQL injection (don't convert the special characters to html code equivalent yet)
Data gets stored in the database exactly how user typed it in
You pull the raw data back out
You run the raw data through a character encoding function or something equivalent to convert special characters to their html codes thus avoiding cross site scripting or html injection
That's could be a problem.
If you want to convert your special characters into entities, you have to htmlencode them twice when outputting into field value/textarea content. But it could mess with other characters - all become their entity representations - quotes, brackets and such. If it's what you're asked for - go ahead. But, in my opinion, it could be a terrible mess to edit such a text.
That's why it's better not to let users to use entities. Why can't they enter the symbol itself?
As for the special characters in your database - just use UTF-8 encoding in both database and HTML.