Why does not PHP accept UTF-8 form data? - php

I'm using ajax. I can track the POST request and see that data is there in the correct state, however, despite the fact that i have
header("Content-Type: text/html;charset=UTF-8");
mb_internal_encoding("UTF-8");
in the beginning of the script, I still get gibberish symbols instead of valid UTF-8 string. What could be the issue?
Here's a part of the html file:
<meta charset="UTF-8">
...
<div id="form-container" role="form" data-toggle="validator" accept-charset="UTF-8" onsubmit="return false">
Here is what my ajax post looks like:

Have you tried mb_detect_encoding(); instead of trying to force it to UTF-8?
So see if mb_internal_encoding(mb_detect_encoding($_POST['value'])); gives you any luck? Or just echo mb_detect_encoding($_POST['value']); to see what encoding it seems to think it is? Just a poke in the dark really.

Related

html form action to php gives error 500

I am just starting out with PHP, and want to get my HTML/javascript form to email via PHP.
So, the bits of the code that are relevant (both in same HTML document):-
(web address changed to protect the innocent, but they are both the same (cut/paste) in real life.)
<form id=emailform method="post" action="http://www.qqqqqqqq.co.uk/PHP/TestPHP.php" enctype="multipart/form-data" accept-charset="UTF-8">
This gives:- Error 500 - Internal server error
Test PHP
This runs the PHP code.
The PHP, which in this case is a simple echo, as the error occurs with any PHP I have, is:-
<html>
<body>
<?php
/* phpinfo(); */
echo "This is a Test PHP echo test"; ?>
</body>
</html>
Have I done something really daft ? I thought that the action (if form valid) would behave similar to the link hence the second test.
One thing that might help, putting quotes around your form's ID, also,
<form id="emailform" method="post" action="http://www.qqqqqqqq.co.uk/PHP/TestPHP.php" enctype="multipart/form-data" accept-charset="UTF-8">
A 500 error means "something has gone wrong on the web site's server but the server could not be more specific on what the exact problem is" - about.com
Try putting the quotes around the ID, and let us know what happens. That for some reason is a common mistake I make a lot, and I get the same issue.
http://pcsupport.about.com/od/findbyerrormessage/a/500servererror.htm

Submit form to a directory (example.com/product/), form action=""?

I've tried <form action="/product/" method="get">, but it doesn't work.
Usually I would have a PHP file such as search.php in the same directory such that <form action="search.php", but I'm implementing a different kind of search which needs to always send the request to the same place.
What I'm getting: (e.g. if I'm on page example.com/product/foo)
example.com/product/foo?id={query};
What I want: example.com/product/?id={query};
Update: Upon instpecting the elements, it seems like it's my action=" product ". Something's up with the slashes. I checked the source code, and it seems fine.
Got it to work after changing double quotes to single quotes... <form action="/product/" method="get">.
Use full url in the action...
eg
<form action="http://example.com/product/" method="get">
...

Posting from IE8 to PHP gives blank $_POST

I have a simple HTML form, sending a post request to a php script. In IE8, the form only works intermittently - most of the time the PHP script sees an empty $_POST variable.
Here's my code:
<html>
<head>
<title>Post test</title>
</head>
<body style="text-align: center;">
<?php
echo "<pre>".print_r($_POST, TRUE)."</pre>";
?>
<form action="<?php echo $_SERVER['PHP_SELF'] ?>" method="post">
<input type="text" name="name">
<input type="hidden" name="hidden" value="moo" >
<input type="submit" value="Search" >
</form>
</body>
</html>
Sometimes the print_r gives the response you'd expect (i.e. it's populated with the data from the form), most of the time it's empty.
Not being able to use POST is a bit of a problem for web applications - anyone got any ideas what's going on, and how to fix it?
Thanks everyone for wading in on this one.
It turns out the problem lay in an Apache module I had enabled.
It's a module to allow apache to use Windows authentication to identify a user via their Windows User id - mod_auth_sspi
The effect is caused by a known bug, in the module, but with a simple extra directive this can be worked around, until a fix is added in the next update, as described here:
http://sourceforge.net/projects/mod-auth-sspi/forums/forum/550583/topic/3392037
That sounds very very bizarre. Does it happen in other versions of IE as well?
I can't tell you what the problem is, but here are my suggestions on how to diagnose it:
Print $_REQUEST rather than just $_POST, to see if the data is coming in via another method.
Use a tool like Fiddler or Wireshark to track exactly what is actually being sent by the browser.
Fiddler in particular has been very helpful for me a few times (mainly when debugging Ajax code), and will tell you exactly what was posted by the browser. If your web server is localhost, you can also use Fiddler to track what is received before PHP gets its hands on it. If not, you can use wireshark on the server if you have permissions for installing that sort of thing.
In addition to Fiddler, I would have suggested a browser-based tool like Firebug, but I don't know of one for IE that is good enough (The IE dev toolbar doesn't give you details of request and response data, as far as I know).
I'm suspicious that when the script is telling you that $_POST is empty, you did not actually POST the form. You can check by adding print($_SERVER['REQUEST_METHOD']); after your print_r($_POST);
If you are posting a file some of the time (i.e. with a file input) then make sure you set enctype="multipart/form-data" in your <form> element.
Have you checked the generated html? Is it possible that echo $_SERVER['PHP_SELF'] isn't producing the output you're after, which messes up the form html, which messes up the POST?

Escaping output safely for both html and input fields

In my web app, users can input text data. This data can be shown to other users, and the original author can also go back and edit their data. I'm looking for the correct way to safely escape this data.
I'm only sql sanitizing on the way in, so everything is stored as it reads. Let's say I have "déjà vu" in the database. Or, to be more extreme, a <script> tag. It is possible that this may be valid, and not even maliciously intended, input.
I'm using htmlentities() on the way out to make sure everything is escaped. The problem is that html and input fields treat things differently. I want to make sure it's safe in HTML, but that the author when editing the text, sees exactly what they typed in the input fields. I'm also using jQuery to fill form fields with the data dynamically.
If I do this:
<p><?=htmlentities("déjà vu");?></p>
<input type=text value="<?=htmlentities("déjà vu");?>">
The page source puts déjà vu in both places (I had to backtick that or you would see "déjà vu"!) The problem is that the output in the <p> is correct, but the input just shows the escaped text. If the user resubmits their form, they double escape and ruin their input.
I know I still have to sanitize text that goes into the field, otherwise you can end the value quote and do bad things. The only solution I found is this. Again, I'm using jQuery.
var temp = $("<div></div>").html("<?=htmlentities("déjà vu");?>");
$("input").val(temp.html());
This works, as it causes the div to read the escaped text as encoded characters, and then the jquery copies those encoded characters to the input tag, properly preserved.
So my question: is this still safe, or is there a security hole somewhere? And more importantly, is this the only / correct way to do this? Am I missing something about how html and character encoding works that make this a trivial issue to solve?
EDIT
This is actually wrong, I oversimplified my example to the point of it not working. The problem is actually because I'm using jQuery's val() to insert the text into the field.
<input>
<script>$("input").val("<?=htmlentities("déjà vu");?>");</script>
The reason for this is that the form is dynamic - the user can add or remove fields at will and so they are generated after page load.
So it seems that jQuery is escaping the data to go into the input, but it's not quite good enough - if I don't do anything myself, a user can still put in a </script> tag, killing my code and inserting malicious code. But there's another argument to be made here. Since only the original author can see the text in an input box anyway, should I even bother? Basically the only people they could execute an XSS attack against is themselves.
I'm sorry but I cannot reproduce the behaviour you describe. I've always used htmlspecialchars() (which does essentially the same task as htmlentities()) and it's never lead to any sort of double-encoding. The page source shows déjà vu in both places (of course! that's the point!) but the rendered page shows the appropriate values and that's what sent back to the server.
Can you post a full self-contained code snippet that exhibits such behaviour?
Update: some testing code:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head><title></title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body>
<?php
$default_value = 'déjà vu <script> ¿foo?';
if( !isset($_GET['foo']) ){
$_GET['foo'] = $default_value;
}
?>
<form action="" method="get">
<p><?php echo htmlentities($_GET['foo']); ?></p>
<input type="text" name="foo" value="<?php echo htmlentities($_GET['foo']); ?>">
<input type="submit" value="Submit">
</form>
</body>
</html>
Answer to updated question
The htmlentities() function, as its name suggests, is used when generating HTML output. That's why it's of little use in your second example: JavaScript is not HTML. It's a language of its own with its own syntax.
Now, the problem you want to fix is how to generate output that follows these two rules:
It's a valid string in JavaScript.
It can be embedded safely in an HTML document.
The closest PHP function for #1 I'm aware of is json_encode(). Since JSON syntax is a subset of JavaScript, if you feed it with a PHP string it will output a JavaScript string.
As about #2, once the browser enters a JavaScript block it expects a </script> tag to leave it. The json_encode() function takes care of this and escapes it properly (<\/script>).
My revised test code:
<?php
$default_value = 'déjà vu </script> ¿foo?';
if( !isset($_GET['foo']) ){
$_GET['foo'] = $default_value;
}
?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head><title></title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js"></script>
<script type="text/javascript"><!--
$(function(){
$("input[type=text]").val(<?php echo json_encode(utf8_encode($_GET['foo'])); ?>);
});
//--></script>
</head>
<body>
<form action="" method="get">
<p><?php echo htmlentities($_GET['foo']); ?></p>
<input type="text" name="foo" value="(to be replaced)">
<input type="submit" value="Submit">
</form>
</body>
</html>
Note: utf8_encode() converts from ISO-8859-1 to UTF-8 and it isn't required if your data is already in UTF-8 (recommended).
If you just need to reverse the encode then you can use html_entity_decode - http://www.php.net/manual/en/function.html-entity-decode.php.
Another possibility to is only run htmlentities at the time the content will be displayed as part of a web page. Otherwise, keep the unencoded text, as submitted or loaded from your datastore.
I believe it is a problem with the way you are applying the value towards the input. It is being displayed as encoded, which makes sense because it is Javascript, not HTML. So, what I would propose is to write your encoded text as part of the markup so that it gets parsed naturally (as opposed to being injected with client script). Since your textboxes are not readily available when the server is responding, you can use a temporary hidden field...
<input type="hidden" id="hidEncoded" value="<?=htmlentities("déjà vu");?>" />
Then it will get parsed as good old HTML, and when you try to access the value with Javascript it should be decoded...
// Give your textbox an ID!
$("#txtInput").val($("#hidEncoded").val());

POSTing XML via HTML Forms

I am developing a web and want to make it so that the user can create some stuff POSTing XML data. For that purpose there is a <textarea> where the user can write (copy/paste) XML and submit it. The problem is that I am losing data: characters such as <, >, and I think others too, get lost.
Maybe it is a framework problem, not sure, I am using Elgg and receiving the data with get_input().
UPDATE1: some code answering the comment:
<form method="POST" action="http://for.bar/slash" enctype="text/xml">
<input name="add" type="submit" value="Create" />
</form>
to receive the data I use elgg get_input()
$data = get_input('data');
If i where to make a wild guess I'd say that there is some kind of auto-magical xss protection being used by get_input(). You could try doing a print_r($_POST); or perhaps elgg is "sanitizing" all of $_POST as well. In this case you may have to base64 encode the data with JavaScript before submitting the request.
According to MDN, the only standard values that should be used in form's enctype attribute are following:
application/x-www-form-urlencoded
multipart/form-data
text/plain
That being said, you can run into unpredictable situations having it to have value application/xml.
Source: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/form#attr-enctype

Categories