Htmlspecialchars ENT_NOQUOTES not working? - php

I'm trying to output the name of a project i.e. "David's Project" in a form, if a user does not correctly input all data in the form, to save the user having to input the name again.
If I var_dump $name I see David's project. But if I echo $name I see David"&#39" Project. I realise that ' (single quote) becomes "&#039"; but I have tried using ENT_NOQUOTES and ENT_COMPAT to avoid encoding the single quote but neither works.
$name = trim(filter_input(INPUT_POST, 'name0', FILTER_SANITIZE_STRING));
<form method="post" class="form" />
Title: <input type="text" name="name0" value="<?php echo
htmlspecialchars($name, ENT_NOQUOTES); ?>">
Am I doing something wrong or should the ENT_NOQUOTES work? I tried using str_replace to replace with ' with an \' but this didn't work either.
The only way round this I have found is to use this:
htmlspecialchars_decode(htmlspecialchars($name, ENT_NOQUOTES));
Is that acceptable?
Sorry I realise this is probably a really stupid question but I just can't get my head around it.
Thanks for any replies.

You can accept a simple answer if it solves your problem BUT you should really understand that what you have delved into is a much larger issue you or someone has created for you.
Databases should not contain HTML encoded characters unless they are specifically meant for storing HTML. I highly doubt this is the case as it very rarely is.
Someone is inserting HTML into your database (html encoding data on insert). This means if you ever want to use a mobile app that is not HTML based, or a command line, or anything at all that might use the data and isn't HTML based, you are going to run into a weird problem where the HTML encoded characters have to be removed on output. This is typically kind of the backwards way to do it and can often cause issues.
You rarely need to "sanitize" your inputs. If anything, you should reject input that is not allowed OR simply escape it in the proper way while inserting it into the database. Sanitizing is only a thing in very special circumstances, which you don't appear to have right now. You're simply inputting and outputting text.
You should pretty much never change users input
My suggestion, if possible, is to fix your INSERT code first so it isn't html encoding data. This html encoding should happen when you output the data TO AN HTML FORMAT. You would use htmlspecialchars() to do this.

Related

Html Entities in <a href=

My MySQL database has some fields that sometimes include an apostrophe, so I take care to encode html entities. For example "Cote d'Or" is stored in the database as "Cote d&039;Or".
When a MySQL query populates an href I get something like this in my source code.
Text link
However when I click on the link I get a 403 "Forbidden" error. On checking, hovering on the link says it is reading &039; as an apostrophe. That seems to be the cause of the page error as putting an apostrophe in the database produces the same error and having nothing in there works correctly.
My question now is, how can I have the html entity in the database and still get the link to work correctly?
For URLs, you don't want to use htmlentities() as that's for displaying HTML.
Instead, you'll want to use urlencode():
$link = '/page.php?location=' . urlencode($location);
If your data is already HTML encoded, you'll need to decode it before passing it through urlencode(). A good function for this is html_entity_decode():
$location = html_entity_decode($row['location']);
$link = '/page.php?location=' . urlencode($location);
You might be better using filter_var($var, FILTER_SANITIZE_URL)
http://php.net/manual/en/filter.filters.sanitize.php
While not the best approach, since you are using htmlentities() you can use html_entity_decode()
From: http://php.net/html_entity_decode
html_entity_decode() is the opposite of htmlentities() in that it converts all HTML entities in the string to their applicable characters.
Well, it turns out the real problem wasn't the html entities. It was some of the security stuff written into Apache. A quiet word with my hosting company and they did something that made everything work (presumably removed a mod_security rule).
Moral of the story: if you get a 403 error, suspect Apache security first.
Thanks to the guys wh provided answers.

Replace '&' with 'and' on the fly in PHP

Is there a way to replace the character & with and in a PHP web form as the user types it rather than after submitting the form?
When & is inserted into our database our search engine doesn't interpret the & correctly replacing it with & returning an incorrect search result (i.e. not the result that included &).
Here is the field we would like to run this on:
<input type="text" name="project_title" id="project_title" value="<?php echo $project_title; ?>" size="60" class="btn_input2"/>
Is there a way to replace the character & with and in a PHP web form as the user types it rather than after submitting the form?
PHP is on the server, it has no control over anything taking place under any circumstances what-so-ever on the client-side. It sends raw text from the web server, a 100megaton thermonuclear device explodes, and PHP never exists anymore after the content is sent. Just the document received on your client side remains. To work with effects on your client side, you need to work with JavaScript.
To do that, you would pick your favorite JavaScript library and add an event listener for "keyup" events. Replace ampersands with "and", and drop the replacement text back in the box. mugur has posted an answer that shows you how to do this.
This is a horrible solution in practice because your users will be screaming for bloody justice to deliver them from such an awful user experience. What you've ended up doing is replacing the input text with something they didn't want. Other search tools do this, why can't yours? You hit backspace, then what? When you hit in the text, you probably lose your cursor position.
Not only that, you're treating a symptom rather than the cause. Look at why you're doing this:
The reason is when & is inserted into our database our search engine flips out and replaces it with & which then returns an incorrect result (i.e. not the result that included &).
No, your database and search engine do no such thing as "flipping out". You're not aware of what's going on and try to treat symptoms rather than learn the cause and fix it. Your symptom cure will create MORE issues down the road. Don't do it.
& is an HTML Entity Code. Every "special" charecter has one. This means your database also encodes > as > as well as characters with accents in them (such as French, German, or Spanish texts). You get "Wrong" results for all of these.
You didn't show any code so you don't get any code. But here's what your problem is.
Your code is converting raw text into HTML Entity codes where appropriate, you're searching against a non-encoded string.
Option 1: Fix the cause
Encode your search text with HTML entities so that it matches for all these cases. Match accent charecters with their non-accented cousins so searching for "francais" might return "français".
Option 2: Fix one symptom
Do a string replace for ampersands either on the client or server side, your search breaks for all other encodings. Never find texts such as "Bob > Sally". Never find "français".
Before submitting the form you'd need to use JavaScript to change as the user types it in. Not ideal since JS can be turned off.
You'd be much better to "clean" the ampersands after submitting but before inserting into the database.
A simple str_replace should work:
str_replace(' & ',' and ', $_POST['value']);
But as others have pointed out, this isn't a good solution. The best solution would be to encode the ampersands as they go into the database (which seems to be happening just now), then modify your search script to allow for this.
You can do that as they complete the form with jquery like this:
$('#input').change(function() { // edited conforming Icognito suggestion
var some_val = $('#input').val().replace('&', 'and');
$('#input').val( some_val );
});
EDIT: working example (http://jsfiddle.net/4gXZW/13/)
JS:
$('.target').change(function() {
$('.target').val($('.target').val().replace('&', 'and'));
});
HTML:
<input class="target" type="text" value="Field 1" />
Otherwise you can do that in PHP before the insert sql.
$to_insert = str_replace("&", "and", $_POST['your_variable']);

How to sanitize HTML POST values of NicEdit?

I recently started to use NicEdit on my "Article Entry" page. However, I have some questions about security and preventing abuse.
First question:
I currently sanitize every input with "mysql_real_escape_string()" in my database class. In addition, I sanitize HTML values with "htmlspecialchars(htmlentities(strip_tags($var))).
How would you sanitize your "HTML inputs" while adding them to database, or the way I'm doing it works perfect?
Second question:
While I was making this question, there was a question with "similar title" so I readed it once. It was someone speaking about "abused HTML inputs" to mess with his valid template. (e.g just input)
It may occur on my current system too. How should it be dealt with in PHP?
Ps. I want to keep using NicEdit, so using BBCode system should be the last advice.
Thank you.
mysql_real_escape_string is not sanitization, it escapes text values to keep the syntax of the SQL query valid/unambiguous/injection safe.
strip_tags is sanitizing your string.
Doing both htmlentities and htmlspecialchars in order is overkill and may just garble your data. Since you're also stripping tags right before that, it's double overkill.
The rule is to make sure your data doesn't break your SQL syntax, therefore you mysql_real_escape_string once before putting the data into the query. You also do the same thing, protecting your HTML syntax, by HTML escaping text before outputting it into HTML, using either htmlspecialchars (recommended) or htmlentities, not both.
For a much more in-depth excursion into all this read The Great Escapism (Or: What You Need To Know To Work With Text Within Text).
I don't know NicEdit, but I assume it allows your users to style text using HTML behind the scenes. Why are you stripping the HTML from the data then? There's no point in using a WYSIWYG editor then.
This is a function I am using in one of my NICEDIT applications and it seems to do well with the code that comes out of nicedit.
function cleanFromEditor($text) {
//try to decode html before we clean it then we submit to database
$text = stripslashes(html_entity_decode($text));
//clean out tags that we don't want in the text
$text = strip_tags($text,'<p><div><strong><em><ul><ol><li><u><blockquote><br><sub><img><a><h1><h2><h3><span><b>');
//conversion elements
$conversion = array(
'<br>'=>'<br />',
'<b>'=>'<strong>',
'</b>'=>'</strong>',
'<i>'=>'<em>',
'</i>'=>'</em>'
);
//clean up the old html with new
foreach($conversion as $old=>$new){
$text = str_replace($old, $new, $text);
}
return htmlentities(mysql_real_escape_string($text));
}

MYSQL Characters like ( ', ", &) etc. appear different

I'm keeping a database that is filled automaticlly by my users. but when there is an input like My Father's Will. It will get into the database like: My Father's Will.
This is not what I want. Can someone tell me how to enable these kinds of special characters or possibly a work around to not show these ugly characters to my users.
I'm using PHP, a MySQL server and PHPMyAdmin as DB Management tool.
It looks like the ' is escaped like a HTML character. I guess you're doing a wrong escaping, like using htmlentities instead of mysql_real_escape_string. If this info doesn't help, please post your code. It will be guessing without.
When you pull the values out of your database, use htmlspecialchars_decode(). This will convert all html special characters back into regular text.
$str = 'My Father's Will';
echo htmlspecialchars_decode($str);
will output:
My Father's Will
I can't really figure what you are asking, since "My Father's Will" and "My Father's Will" is exactly the same?
But it seems like a problem related to either string escaping in PHP or conflicting encoding in the MySQL-database, try to have a look into both and feel free to specify you question a bit more.
It sounds like you might be escaping (such as php's htmlentities()) your input on its way to the database. The correct thing to do would be to instead escape it only on output back to the screen.
Most likely you have a call to htmlspecialchars(..., ENT_QUOTES) in your code somewhere, which would encode ' and " into character entities. If they're in the database in encoded form, and the end-user sees the character entities, then you're doing a double-encoding and your script's output is something like &x27;.

PHP: simple form encoding/decoding

Probably, this question has been asked before, though, I'll ask it again.
Currently, I'm facing a problem with form encoding. When posting my form, all spaces are replaced by the "+" character. I would like to replace this "+" character by a real space.
Does someone has a PHP solution for this?
Thanks in advance.
Cheers, Lennart
Can't reproduce
<form>
<input type=text name="a" value="text with spaces">
<input type=submit>
</form>
<?php if (isset($_GET['a'])) echo $_GET['a'] ?>
no spaces at all. What i m doing wrong?
This shouldn't happen if the browser behaves correctly. My assumption would be that a javascript is messing with your data. Replacing spaces with pluses is done when encoding urls, maybe that will help.
You can use firebug to check for any js interference.
I'm using AJAX (x = in this case JSON) for handling the form posts etc
Then let's see the code.
Possibly you're doing something like trying to form-encode your data manually before another component also form-encodes it. Replacing a space with + is quite standard and expected for form-encoding, but if you accidentally do it twice then you're going to be left with an encoded + at the end of it.
If you are using the JavaScript escape function: don't. (When you need to URL-encode a form value for inclusion in a parameter, the proper method is encodeURIComponent. escape is a fruity non-standard encoding of its own which you should almost never have any need to use.)

Categories