If a full name is submitted to the name column of database and it's pulled onto a web page it adds a + sign instead on the space.
Also if theres a " within the message text that in the message column and its pulled onto a web page it displays a \ before every "
Is there any way of fixing these issues
From the code that you added in your comments, expanding my comment in to an answer. The '+' is because you are urlencodeing some of your rows. urlencode is meant for data that will be part of a URL, what I think you are wanting to do is display it in HTML, in which you would want to use htmlentities. But right after pulling from your DB, you'll want to use stripslashes before using the htmlentities.
It appears that when you get your POST data, your server is already adding slashes. Depending on your server version, you'll want to check the Magic Quotes, and if enabled, stripslashes before pushing it through the mysql_real_escape_string. However, since your DB is already set up, it might be easier to skip this paragraph completely and deal with what you already have.
Side note, using 'prepared statements' is a better practice, and eliminates the need to use mysql_real_escape_string. ^^
Related
I'm having lots of trouble preserving the exact look of how a user types out a short paragraph.
My problem is that random slashes and html show up. When people would hit enter while typing the message, "\r\n\" would show up when it's echoed later. I tried fixing that but now when the user types an apostrophe while composing a message, it gets inserted into the database with 3 back slashes, and thus echoed later with 3 back slashes with the apostrophe. Frustrating! I want to just start over!
Here's what I do.
User types a message in an input field and hits submit.
That message gets inserted into the database with type varchar(280) via php.
That message gets echoed via php.
I've tried many different things like nlbr and strip_tags and stripslashes and mysql_real_escape_string and others. I might be using these all in a certain combination that messes it up.
So my question is what is the best way to preserve exactly how someone composes a text paragraph to be later echoed via php to look just like how they typed it?
Make sure Magic Quotes are off or, if you can't disable them, cleanse your strings from them. Read the manual for details: http://www.php.net/manual/en/security.magicquotes.php
When inserting your text into the database, escape it properly for SQL syntax once or, better, use prepared statements. See How can I prevent SQL injection in PHP? and The Great Escapism (Or: What You Need To Know To Work With Text Within Text).
When outputting to HTML, use htmlspecialchars to avoid HTML injection or plain syntax problems and afterwards use nl2br to format line breaks specifically for HTML.
That's basically it.
On the second step you need to escape it with mysql function.
But for correct outputing it you need to do following
<pre><?= htmlentities($mysqlRow['data']); ?></pre>
This will get from database result needed information and will outputs it like it is. With all spaces and tabs and html tags in it. (If user enters <html> this will output <html> like text)
I thought the proper way to "sanitize" incoming data from an HTML form before entering it into a mySQL database was to use real_escape_string on it in the PHP script, like this:
$newsStoryHeadline = $_POST['newsStoryHeadline'];
$newsStoryHeadline = $mysqli->real_escape_string($newsStoryHeadline);
$storyDate = $_POST['storyDate'];
$storyDate = $mysqli->real_escape_string($storyDate);
$storySource = $_POST['storySource'];
$storySource = $mysqli->real_escape_string($storySource);
// etc.
And once that's done you could just insert the data to the DB like this:
$mysqli->query("INSERT INTO NewsStoriesTable (Headline, Date, DateAdded, Source, StoryCopy) VALUES ('".$newsStoryHeadline."', '".$storyDate."', '".$dateAdded."', '".$storySource."', '".$storyText."')");
So I thought doing this would take care of cleaning up all the invisible "junk" characters that may be coming in with your submitted text.
However, I just pasted some text I copied from a web-page into my HTML form, clicked "submit" - which ran the above script and inserted that text into my DB - but when I read that text back from the DB, I discovered that this piece of text did still have junk characters in it, such as –.
And those junk characters of course caused the PHP script I wrote that retrieves the information from the DB to crash.
So what am I doing wrong?
Is using real_escape_string not the way to go here? Or should I be using it in conjunction with something else?
OR, is there something I should be doing (like more escaping) when reading reading data back out from the the mySQL database?
(I should mention that I'm an Objective-C developer, not a PHP/mySQL developer, but I've unfortunately been given this task to do some DB stuff - hence my question...)
thanks!
Your assumption is wrong. mysqli_real_escape_string’s only intention is to escape certain characters so that the resulting string can be safely used in a MySQL string literal. That’s it, nothing more, nothing less.
The result should be that exactly the passed data is retained, including ‘junk’. If you don’t want that ‘junk’ in your database, you need to detect, validate, or filter it before passing to to MySQL.
In your case, the ‘junk’ seems to be due to different character encodings: You input data seems to be encoded with UTF-8 while it’s later displayed using Windows-1250. In this scenario, the character – (U+2013) would be encoded with 0xE28093 in UTF-8 which would represent the three characters â, €, and “ in Windows-1250. Properly declaring the document’s encoding would probably fix this.
Sanitization is a tricky subject, because it never means the same thing depending on the context. :)
real_escape_string just makes sure your data can be included in a request (inside quotes, of course) without having the possibility to change the "meaning" of the request.
The manual page explains what the function really does: it escapes nul characters, line feeds, carriage returns, simple quotes, double quotes, and "Control-Z" (probably the SUBSTITUTE character). So it just inserts a backslash before those characters.
That's it. It "sanitizes" the string so it can be passed unchanged in a request. But it doesn't sanitize it under any other point of view: users can still pass for instance HTML markers, or "strange" characters. You need to make rules depending on what your output format is (most of the time HTML, but HTTP isn't restricted to HTML documents), and what you want to let your users do.
If your code can't handle some characters, or if they have a special meaning in the output format, or if they cause your output to appear "corrupted" in some way, you need to escape or remove them yourself.
You will probably be interested in htmlspecialchars. Control characters generally aren't a problem with HTML. If your output encoding is the same as your input encoding, they won't be displayed and thus won't be an issue for your users (well, maybe for the W3C validator). If you think it is, make your own function to check and remove them.
I'm having problems inserting commas (,) in my text fields in html. When I submit it to mysql, it deletes the data. How do I work with this?
I've tried mysql_real_escape_string() but that still doesn't work. I have lots of data, and I don't want to use str_replace either. Is there another alternative?
escape your message before you send it to the server, so it's stored escaped, then unescape it when you print it in your html page.
so... msgTosend = escape(whateverText);
and then when you're printing
msgToPrint = getFromDatabase(unescape(myText))
however, as the comment points out, you're obviously doing something dreadfully wrong altogether.
If you're storing strings (as it sounds) you need to be wrapping them in quotes before you store them. Once you do that no amount of commas can ruin anything. If you're not storing strings, but some other data type, then you should be breaking those out into individual variables on the server before storing anything. The potential for malice or just plain breakage is basically 100% with what it sounds like you're doing.
I am trying to figure out what is the best way to manage the data a user inputs concerning non desirable tags he might insert:
strip_tags() - the tags are removed and they are not inserted in the database
the tags are inserted in the database, but when reading that field and displaying it to the user we would use htmlspecialchars()
What's the better, and is there any disadvantage in any of these?
Regards
This depends on what your priority is:
if it's important to display special characters from user input (like on StackOverflow, for example), then you'll need to store this information in the database and sanitize it on display - in this case, you'll want to at least use htmlspecialchars() to display the output (if not something more sophisticated)
if you just want plain text comments, use strip_tags() before you stick it in the database - this way you'll reduce the amount of data that you need to store, and reduce processing time when displaying the data on the screen
the tags are inserted in the database, but when reading that field and displaying it to the user we would use htmlspecialchars()
This. You usually want people to be able to type less-than signs and ampersands and have them displayed as such on the page. htmlspecialchars on every text-to-HTML output step (whether that text came directly from user input, or from the database, or from somewhere else entirely) is the right way to achieve this. Messing about with the input is a not-at-all-appropriate tactic for dealing with an output-encoding issue.
Of course, you will need a different escape — or parameterisation — for putting text in an SQL string.
The measures taken to secure user input depends entirely on in what context the data is being used. For instance:
If you're inserting it into a SQL database, you should use parameterized statements. PHP's mysql_real_escape_string() works decently, as well.
If you're going to display it on an HTML page, then you need to strip or escape HTML tags.
In general, any time you're mixing user input with another form of mark-up or another language, that language's elements need to be escaped or stripped from the input before put into that context.
The last point above segues into the next point: Many feel that the original input should always be maintained. This makes a lot of sense when, later, you decide to use the data in a different way and, for instance, HTML tags aren't a big deal in the new context. Also, if your site is in some way compromised, you have a record of the exact input given.
Specifically related to HTML tags in user input intended for display on an HTML page: If there is any conceivable reason for a user to input HTML tags, then simply escape them. If not, strip them before display.
PLATFORM:
PHP & mySQL
For my experimentation purposes, I have tried out few of the XSS injections myself on my own website. Consider this situation where I have my form textarea input. As this is a textarea, I am able to enter text and all sorts of (English) characters. Here are my observations:
A). If I apply only strip_tags and mysql_real_escape_string and do not use htmlentities on my input just before inserting the data into the database, the query is breaking and I am hit with an error that shows my table structure, due to the abnormal termination.
B). If I am applying strip_tags, mysql_real_escape_string and htmlentities on my input just before inserting the data into the database, the query is NOT breaking and I am able to successfully able to insert data from the textarea into my database.
So I do understand that htmentities must be used at all costs but unsure when exactly it should be used. With the above in mind, I would like to know:
When exactly htmlentities should be used? Should it be used just before inserting the data into DB or somehow get the data into DB and then apply htmlentities when I am trying to show the data from the DB?
If I follow the method described in point B) above (which I believe is the most obvious and efficient solution in my case), do I still need to apply htmlentities when I am trying to show the data from the DB? If so, why? If not, why not? I ask this because it's really confusing for me after I have gone through the post at: http://shiflett.org/blog/2005/dec/google-xss-example
Then there is this one more PHP function called: html_entity_decode. Can I use that to show my data from DB (after following my procedure as indicated in point B) as htmlentities was applied on my input? Which one should I prefer from: html_entity_decode and htmlentities and when?
PREVIEW PAGE:
I thought it might help to add some more specific details of a specific situation here. Consider that there is a 'Preview' page. Now when I submit the input from a textarea, the Preview page receives the input and shows it html and at the same time, a hidden input collects this input. When the submit button on the Preview button is hit, then the data from the hidden input is POST'ed to a new page and that page inserts the data contained in the hidden input, into the DB. If I do not apply htmlentities when the form is initially submitted (but apply only strip_tags and mysql_real_escape_string) and there's a malicious input in the textarea, the hidden input is broken and the last few characters of the hidden input visibly seen as " /> on the page, which is undesirable. So keeping this in mind, I need to do something to preserve the integrity of the hidden input properly on the Preview page and yet collect the data in the hidden input so that it does not break it. How do I go about this? Apologize for the delay in posting this info.
Thank you in advance.
Here's the general rule of thumb.
Escape variables at the last possible moment.
You want your variables to be clean representations of the data. That is, if you are trying to store the last name of someone named "O'Brien", then you definitely don't want these:
O'Brien
O\'Brien
.. because, well, that's not his name: there's no ampersands or slashes in it. When you take that variable and output it in a particular context (eg: insert into an SQL query, or print to a HTML page), that is when you modify it.
$name = "O'Brien";
$sql = "SELECT * FROM people "
. "WHERE lastname = '" . mysql_real_escape_string($name) . "'";
$html = "<div>Last Name: " . htmlentities($name, ENT_QUOTES) . "</div>";
You never want to have htmlentities-encoded strings stored in your database. What happens when you want to generate a CSV or PDF, or anything which isn't HTML?
Keep the data clean, and only escape for the specific context of the moment.
Only before you are printing value(no matter from DB or from $_GET/$_POST) into HTML. htmlentities have nothing to do with database.
B is overkill. You should mysql_real_escape_string before inserting to DB, and htmlentities before printing to HTML. You don't need to strip tags, after htmlentities tags will be displayed on screen as < b r / > e.t.c
Theoretically you may do htmlentities before inserting to DB, but this might make further data processing harder, if you would need original text.
3. See above
In essence, you should use mysql_real_escape_string prior to database insertion (to prevent SQL injection) and then htmlentities, etc. at the point of output.
You'll also want to apply sanity checking to all user input to ensure (for example) that numerical values are really numeric, etc. Functions such as is_int, is_float, etc. are useful at this point. (See the variable handling functions section of the PHP manual for more information on these functions and other similar ones.)
I've been through this before and learned two important things:
If you're getting values from $_POST/$_GET/$_REQUEST and plan to add to DB, use mysql_real_escape_string function to sanitize the values. Do not encode them with htmlentities.
Why not just encode them with htmlentities and put them in database? Well, here's the thing - the goal is to make data as meaningful and clean as possible and when you encode the data with htmlentities like Jeff's Dog becomes Jeff"s Dog ... that will cause the context of data to lose its meaning. And if you decide to implement REST servcies and you fetch that string from DB and put it in JSON - it'll come up like Jeff"s Dog which isn't pretty. You'd have to add another function to decode as well.
Suppose you want to search for "Jeff's Dog" using SQL "select * from table where field='Jeff\'s Dog'", you won't find it since "Jeff's Dog" does not match "Jeff"s Dog." Bad, eh?
To output alphanumeric strings (from CHAR type) to a webpage, use htmlentities - ALWAYS!