I have a PHP script that stores my code snippets.
To insert, I use:
$snippet_code = mysqli_real_escape_string($conn,trim($_POST['snippet_code']));
To display, I use the following which is wrapped in a pre tag:
$snippet_code = htmlentities($row['SnippetText']);
I notice that sometimes I get a lot of escape characters like \\\\ when the snippet is displayed on the page. The escape characters are present wherever single or double quotes appear in the code. The problem seems to be more severe in non-English language browsers.
How can I properly do this? How can I properly store and display code on a page?
Assuming you mean slash escape sequences like \", and not HTML escape sequences like & try this:
$snippet_code = htmlentities(stripslashes($row['SnippetText']));
If it is actually HTML escapes causing you trouble, just omit the htmlentities call.
If you are getting ' converted to \', your server is probably configured with a legacy option called Magic Quotes. You can read about it in the PHP manual. My advice is to disable them if possible.
Also, check your database. It's possible that your current data is corrupted. If so, you can write a small script thay uses stripslashes() to fix it.
From your comments, it seems that you are in fact talking about slashes found before quotes.
It's not clear from the limited information you've given us why non-English browsers would show more of these.
However, it is likely that these slashes should not be present in the first place. Perhaps you are running mysql_real_escape_string several times, instead of just once... but, again, nothing you've shown us indicates that.
Either way, you should fix the data in the database and not just hack around the issue on display.
Related
I'm working on a WordPress site with some other developers and the code they wrote to set upcustom variables for Google Analytics, via the _setCustomVar, uses html_entity_decode. They pointed to the well known and much used Yoast plugin which uses a similar technique. I can't figure out why you would use it that way though.
At no point (that I can see) does the string get encoded, so the function doesn't do anything. WordPress delivers whole strings, even with accents on them, never anything encoded, so there aren't rogue encoded characters to worry about. In fact, the one thing you don't want to do is send Google Analytics a mess of HTML, right?
I've changed it because I'm pretty sure that what using html_entity_decode doesn't do is remove single quotes, which in a JS script where strings are contained by single quotes, means that any variable with an apostrophe just breaks Google Analytics tracking entirely.
Instead, I'm cleaning strings using a strip_tags and esc_js (a WordPress function).
I'm a little concerned because the linked script is very commonly used. It seems like I must be wrong about something and I don't want to screw up my own script because of it.
What am I missing?
The answer seems to be that Yoast uses that code as a 'just in case' measure for strings that might have encoded characters in them. It still doesn't seem to take care of quote marks though, which is a pretty big deal.
Here's the code I wrote to solve all the issues: https://gist.github.com/AramZS/8930496
I thought the proper way to "sanitize" incoming data from an HTML form before entering it into a mySQL database was to use real_escape_string on it in the PHP script, like this:
$newsStoryHeadline = $_POST['newsStoryHeadline'];
$newsStoryHeadline = $mysqli->real_escape_string($newsStoryHeadline);
$storyDate = $_POST['storyDate'];
$storyDate = $mysqli->real_escape_string($storyDate);
$storySource = $_POST['storySource'];
$storySource = $mysqli->real_escape_string($storySource);
// etc.
And once that's done you could just insert the data to the DB like this:
$mysqli->query("INSERT INTO NewsStoriesTable (Headline, Date, DateAdded, Source, StoryCopy) VALUES ('".$newsStoryHeadline."', '".$storyDate."', '".$dateAdded."', '".$storySource."', '".$storyText."')");
So I thought doing this would take care of cleaning up all the invisible "junk" characters that may be coming in with your submitted text.
However, I just pasted some text I copied from a web-page into my HTML form, clicked "submit" - which ran the above script and inserted that text into my DB - but when I read that text back from the DB, I discovered that this piece of text did still have junk characters in it, such as –.
And those junk characters of course caused the PHP script I wrote that retrieves the information from the DB to crash.
So what am I doing wrong?
Is using real_escape_string not the way to go here? Or should I be using it in conjunction with something else?
OR, is there something I should be doing (like more escaping) when reading reading data back out from the the mySQL database?
(I should mention that I'm an Objective-C developer, not a PHP/mySQL developer, but I've unfortunately been given this task to do some DB stuff - hence my question...)
thanks!
Your assumption is wrong. mysqli_real_escape_string’s only intention is to escape certain characters so that the resulting string can be safely used in a MySQL string literal. That’s it, nothing more, nothing less.
The result should be that exactly the passed data is retained, including ‘junk’. If you don’t want that ‘junk’ in your database, you need to detect, validate, or filter it before passing to to MySQL.
In your case, the ‘junk’ seems to be due to different character encodings: You input data seems to be encoded with UTF-8 while it’s later displayed using Windows-1250. In this scenario, the character – (U+2013) would be encoded with 0xE28093 in UTF-8 which would represent the three characters â, €, and “ in Windows-1250. Properly declaring the document’s encoding would probably fix this.
Sanitization is a tricky subject, because it never means the same thing depending on the context. :)
real_escape_string just makes sure your data can be included in a request (inside quotes, of course) without having the possibility to change the "meaning" of the request.
The manual page explains what the function really does: it escapes nul characters, line feeds, carriage returns, simple quotes, double quotes, and "Control-Z" (probably the SUBSTITUTE character). So it just inserts a backslash before those characters.
That's it. It "sanitizes" the string so it can be passed unchanged in a request. But it doesn't sanitize it under any other point of view: users can still pass for instance HTML markers, or "strange" characters. You need to make rules depending on what your output format is (most of the time HTML, but HTTP isn't restricted to HTML documents), and what you want to let your users do.
If your code can't handle some characters, or if they have a special meaning in the output format, or if they cause your output to appear "corrupted" in some way, you need to escape or remove them yourself.
You will probably be interested in htmlspecialchars. Control characters generally aren't a problem with HTML. If your output encoding is the same as your input encoding, they won't be displayed and thus won't be an issue for your users (well, maybe for the W3C validator). If you think it is, make your own function to check and remove them.
If a full name is submitted to the name column of database and it's pulled onto a web page it adds a + sign instead on the space.
Also if theres a " within the message text that in the message column and its pulled onto a web page it displays a \ before every "
Is there any way of fixing these issues
From the code that you added in your comments, expanding my comment in to an answer. The '+' is because you are urlencodeing some of your rows. urlencode is meant for data that will be part of a URL, what I think you are wanting to do is display it in HTML, in which you would want to use htmlentities. But right after pulling from your DB, you'll want to use stripslashes before using the htmlentities.
It appears that when you get your POST data, your server is already adding slashes. Depending on your server version, you'll want to check the Magic Quotes, and if enabled, stripslashes before pushing it through the mysql_real_escape_string. However, since your DB is already set up, it might be easier to skip this paragraph completely and deal with what you already have.
Side note, using 'prepared statements' is a better practice, and eliminates the need to use mysql_real_escape_string. ^^
Firstly magic quotes & runtime are disabled correctly in php.ini, and confirmed by phpinfo().
PHP version: 5.3.4
MySQL version: 5.1.52
I'm only using mysql_real_escape_string on the data, after htmlspecialchars and a trim, that's all the data cleaning on the variable.
Yet, when I submit a single quote, the slash remains in the database.
When running mysql_query I'm using "' . $var . '", although in the past this hasn't changed anything (could be due to the double quotes?).
Any ideas? and please don't tell me about PDO/prepared statements, I'm aware of them and I have my reasons for doing it this way.
Thanks!
Code example (this is the only thing done to the data):
mysql_real_escape_string( htmlspecialchars( trim( $data ) ) );
I'm only use mysql_real_escape_string on the data, after htmlspecialchars and a trim, that's all the data cleaning on the variable.
No. Only use mysql_real_escape_string for storing data in the database. Don't mangle your data when you store it.
The function htmlspecialchars is used to encode a string to HTML (< becomes < etc.) and it should only be used for this purpose.
Perhaps the massively misguided, unhelpful and damaging option
magic_quotes_gpc
Has been enabled?
You can check that in the output of phpinfo(), but there's not a lot you can do if the server admin has enabled it globally without the ability to overrride.
I recommend checking if it's on (on every page of the app of course), and if so, causing the application to die quickly and painfully to ensure that you avoid data corruption (which chiefly manifests itself as the proliferation of backslashes you described).
Then go around to the server admin's house with a blunt weapon of your choice.
Hopefully you can do all this before your database becomes overrun with hoards of evil self-multiplying backslashes.
your storing procedure is correct. (altough htmlspecialchars and/or trim is probably not needed - but i dunno about your application)
from the information you are providing there is no reason to be seen for your problem.
the next debugging approach would then be remembering whatever else you may changed or has been changed on your system (maby you are using some 3rd party installation image?).
if that fails ie is left to wild guessing possible causes, for which i will offer a first one:
mysql could be running in NO_BACKSLASH_ESCAPES -mode, which would cause backlashes to be treated as regular characters.
furthermore it looks like you are wrapping your strings in double quotes, which would then insert a single quote - which usually gets escaped - straight into your database, preceded by a backslash.
it may very likely be also possible that - as you are wrapping your strings with double quotes inside your sql statements, which is not how it should be like and i am baffled you dont get a syntax violation error, you end up with some query like "john\'s house" which is caused by the single quote escaping from mysql_real_escape, which would be correct if you had your query correctly wrapped by single quotes.
which leads me to the question. do you get a syntax error (or an injected query) when trying to insert double quotes?
as for your comment. you could very well prepare statements with pdo and, then get the query string form it, and execute them using mysql functions. however i realise that this is no solution to your problem.
please also try putting your whole query in only one variable and print that out directly before executing it. then have a look at it and follow any wrong manipulation back operation by operation that is done to produce the string.
If you use double quotes within the SQL commands after escaping the data:
SELECT "1\'2"
then it will store and return the value as 1\'2 with the backslash still intact.
The proper syntax for SQL strings is using single quotes. That's what mysql_real_escape_string is escaping for. Else it would have to escape double quotes, whose usage however it is completely unaware of.
Use double quotes in PHP. Use single quotes for SQL. Rewrite your code like that:
$x = escapy($x);
$y = escapy($y);
sql_query("INSERT INTO tbl (x,y) VALUES ('$x', '$y')");
PHP uses "magic quotes" by default but has gotten a lot of flak for it. I understand it will disable it in next major version of PHP.
While the arguments against it makes sense, what I don't get it is why not just use the HTML entities to represent quotes instead of stripping and removing slashes? After all, a VAST majority of mySQL is used for outputting to web browsers?
For example, ' is used instead of ' and it won't affect the database at all.
Another question, why can't PHP just have configurations set up for each version of PHP with this tag <?php4 or <?php5 so appropriate interpreters can be loaded for those versions?
Just curious. :)
Putting ' into a string column in a database would be fine, if all you use database content for is outputting to a web page. But that's not true.
It's better to escape output at the time you output it. That's the only time you know for sure that the output is going to a web page -- not a log file, an email, or other destination.
PS: PHP already turns magic quotes off by default in the standard php.ini file. It's deprecated in PHP 5.3, and it will be removed from the language entirely in PHP 6.0.
Here's a good reason, mostly in response to your own posted answer: Using htmlspecialchars() or htmlentities() does not make your SQL query safe. That's what mysql_real_escape_string() is for.
You seem to be making the assumption that it's only the single and double quote characters that pose a problem. MySQL queries are actually vulnerable to the \x00, \n, \r, \, ', " and \x1a characters in your data. If you are not using prepared statements or mysql_real_escape_string(), then you have an SQL injection vulnerability.
htmlspecialchars() and htmlentities() do not convert all of these characters, ergo you cannot make your query safe by using these functions. To that end, addslashes() does not make your query safe either!
Other smaller downsides include what the other posters have already mentioned about MySQL not always being used for web content, as well as the fact that you are increasing the amount of storage and index space needed for your data (consider one byte of storage for a quote character, versus six or more bytes of storage for its entity form).
I will reply to your first question only.
Validation of input is a wrong approach anyway, because it's not input that matters, the problem is where it's used. PHP can't assume that all input to a MySQL query would be output to a context where a HTML Entity would make sense.
It's nice to see that magic_quotes is going; it's the cause of a lot of security issues with PHP, and it's nice to see them taking a new approach :)
You'll do yourself a big favour if you reframe your validation approaches to validate on OUTPUT, for the context you are working in. Only you, as the programmer, can know this.
The reason that MySQL doesn't convert ' to ' is because ' is not '. If you want to convert your data for output, you should be doing that at the view layer, not in your database. It's really not very hard to just call htmlentities before/when you echo.
Thanks everyone. I had to REALLY think what you meant and the implications it may have if I change the quotes to HTML entities instead of adding slashes to them but again, isn't that actually changing the output/input too?
I cannot think of a reason why we CANNOT or SHOULDN'T use HTML entities for mySQL as long as we make it clear that all data is encoded using HTML entities. After all, my argument is based on a fact that the majority of mySQL is used for outputting to HTML browsers and also the fact that ' and " and / can seriously harm mySQL databases. So, isn't it actually SAFER to encode ' and " and / as HTML entities before sending them as INSERT queries? Also, we're going XML so why waste time writing htmlentities and stripslashes and addslashes when accessing data that's ALREADY encoded in HTML entities?
You can't just convert ' to '. Think about it: what happens when you want to store the string "'"? If you store ' then when you load the page it will display ' and not '.
So now you have to convert ALL HTML entities, not just quotes. Then you start getting into all sorts of weird conversion problems. The simplest solution is to just store the real data in the database, then you can display it how you like. You might want to use the actual quotes - in most cases " and ' don't do any harm outside of the tag brackets.
Sometimes you may want to store actual HTML in a field and display it raw (as long as it's checked and sanitized on its way in/out.