In an HTML <input>:
Is it an obligation to set name attribute with English characters?
I want to use it later, in $_POST['some_utf8_characters_and_not_english_characters'].
Is it possible to cause a problem later?
According to RFC1866 chapter 3.2.4, an attribute's value can be anything except the value delimiter (single or double quote), and shouldn't contain HTML tag delimiters (< and >).
However, you'll have to test how JavaScript behaves on all browsers (remember your great friend MSIE...) when you try to access a DOM element using name as references. For example: document.anElementWithPersianName or document.forms['aFormWithAPersianName']. So if you use JS to validate, and/or ajax to submit a form, you'll need to be sure that JS is able to handle this character set properly.
In any case, you'll have to ensure that:
your PHP scripts use UTF-8-based functions when it's about string manipulation (I think some functions need to have the charset passed as an argument)
these scripts are themselves saved in UTF-8 files
you correctly set the character set in the HTML header and/or PHP's response header
Best thing to do: create a simple form, do some JS tricks on it, and have a PHP script parse the submitted results and print them.
This is successfully working in one of my websites. No problems.
<input name="UTF_word" />
$_POST['UTF_word']
Both does not give any problems in client side, including jquery (not checked in IE) or server side.
Related
I can't figure out why my php processing script stops when it encounters a special character in a tinymce textarea.
example if I type foo and submit, fine...no problems but if I type foo<<<, it stops after foo when I submit
the editor is creating the html entities and sending them through ajax
getting the content with
var c = tinyMCE.get('content').getContent();
and sending the content
ajax.send("action=edit_content&c="+c+"&id="+id);
and I can see in firebug that the string is being passed
action=edit_content&c=<p>foo <<<</p>&id=8
and the php is really nothing special at all, just set that post to a var
is it maybe because of the & in the < ? maybe it thinks that is actually another post parameter?
I am still getting my feet wet when it comes to ajax. If I am correct on my assumption, how do I fix that?
You have the right idea. The ampersand is breaking the URL string.
In order to fix breaking characters, you have to escape the string.
Try this:
ajax.send("action=edit_content&c="+escape(c)+"&id="+id);
You probably won't have to (because Apache will do it for you), but if necessary, you can also unescape the string on the PHP side using urldecode:
<?php echo urldecode($_GET['c']); ?>
I wrote a script that when you enter a textbox, it will open an invisible iframe to a .php file with $_GET of what they wrote into the textbox.
However, for example, if I type: '<3' in it, this is what happens.
PHP determins that the $_GET[s] is blank! Users cant put a simple <3 symbol without getting that error.
Another problem is quotes, if I write any quotes, it will end the entire SRC property.
What should I do? Should I do something with javascript, or even PHP? Please let me know!
Thanks!
Use urlencode to encode the inputted string into a valid one for URL use.
Also be very cautious when allowing user input into your PHP script through the URL. Make sure you do proper checks/sanitization, especially if database operations are involved.
It looks like your iframe is generated by JavaScript, so all those answers that include PHP functions are useless. The data isn't even reaching PHP, so how can any PHP function hope to help?
Instead, try using urlencode from PHPJS, since none of JS's functions really handle all cases well, and this makes it easy for you to use PHP's urldecode to retrieve the data.
You need to encode that character as <.
Regarding double quotes, you can use this trick.
attr='Your string can "contain double quotes"'
or
attr="Your string can 'contain double quotes'"
but while specifying variable=values in url, you don't need to user double quotes, you can directly assign the values.
like
url="test.php?var1=123&var2=345"
rest about sending the <3 characters, you can check for url encoding in javascript & PHP whichever applicable!
Is there a way to replace the character & with and in a PHP web form as the user types it rather than after submitting the form?
When & is inserted into our database our search engine doesn't interpret the & correctly replacing it with & returning an incorrect search result (i.e. not the result that included &).
Here is the field we would like to run this on:
<input type="text" name="project_title" id="project_title" value="<?php echo $project_title; ?>" size="60" class="btn_input2"/>
Is there a way to replace the character & with and in a PHP web form as the user types it rather than after submitting the form?
PHP is on the server, it has no control over anything taking place under any circumstances what-so-ever on the client-side. It sends raw text from the web server, a 100megaton thermonuclear device explodes, and PHP never exists anymore after the content is sent. Just the document received on your client side remains. To work with effects on your client side, you need to work with JavaScript.
To do that, you would pick your favorite JavaScript library and add an event listener for "keyup" events. Replace ampersands with "and", and drop the replacement text back in the box. mugur has posted an answer that shows you how to do this.
This is a horrible solution in practice because your users will be screaming for bloody justice to deliver them from such an awful user experience. What you've ended up doing is replacing the input text with something they didn't want. Other search tools do this, why can't yours? You hit backspace, then what? When you hit in the text, you probably lose your cursor position.
Not only that, you're treating a symptom rather than the cause. Look at why you're doing this:
The reason is when & is inserted into our database our search engine flips out and replaces it with & which then returns an incorrect result (i.e. not the result that included &).
No, your database and search engine do no such thing as "flipping out". You're not aware of what's going on and try to treat symptoms rather than learn the cause and fix it. Your symptom cure will create MORE issues down the road. Don't do it.
& is an HTML Entity Code. Every "special" charecter has one. This means your database also encodes > as > as well as characters with accents in them (such as French, German, or Spanish texts). You get "Wrong" results for all of these.
You didn't show any code so you don't get any code. But here's what your problem is.
Your code is converting raw text into HTML Entity codes where appropriate, you're searching against a non-encoded string.
Option 1: Fix the cause
Encode your search text with HTML entities so that it matches for all these cases. Match accent charecters with their non-accented cousins so searching for "francais" might return "français".
Option 2: Fix one symptom
Do a string replace for ampersands either on the client or server side, your search breaks for all other encodings. Never find texts such as "Bob > Sally". Never find "français".
Before submitting the form you'd need to use JavaScript to change as the user types it in. Not ideal since JS can be turned off.
You'd be much better to "clean" the ampersands after submitting but before inserting into the database.
A simple str_replace should work:
str_replace(' & ',' and ', $_POST['value']);
But as others have pointed out, this isn't a good solution. The best solution would be to encode the ampersands as they go into the database (which seems to be happening just now), then modify your search script to allow for this.
You can do that as they complete the form with jquery like this:
$('#input').change(function() { // edited conforming Icognito suggestion
var some_val = $('#input').val().replace('&', 'and');
$('#input').val( some_val );
});
EDIT: working example (http://jsfiddle.net/4gXZW/13/)
JS:
$('.target').change(function() {
$('.target').val($('.target').val().replace('&', 'and'));
});
HTML:
<input class="target" type="text" value="Field 1" />
Otherwise you can do that in PHP before the insert sql.
$to_insert = str_replace("&", "and", $_POST['your_variable']);
I have searched stackoverflow on this problem and did find a few topics, but I feel like there isn't really a solid answer for me on this.
I have a form that users submit and the field's value is stored in a XML file. The XML is set to be encoded with UTF-8.
Every now and then a user will copy/paste text from somewhere and that's when I get the "entity not defined error".
I realize XML only supports a select few entities and anything beyond that is not recognized - hence the parser error.
From what I gather, there's a few options I've seen:
I can find and replace all and swap them out with or an actual space.
I can place the code in question within a CDATA section.
I can include these entities within the XML file.
What I'm doing with the XML file is that the user can enter content into a form, it gets stored in a XML file, and that content then gets displayed as XHTML on a Web page (parsed with SimpleXML).
Of the three options, or any other option(s) I'm not aware of, what's really the best way to deal with these entities?
Thanks,
Ryan
UPDATE
I want to thank everyone for the great feedback. I actually determined what caused my entity errors. All the suggestions made me look into it more deeply!
Some textboxes where plain old textboxes, but my textareas were enhanced with TinyMCE. It turns out, while taking a closer look, that the PHP warnings always referenced data from the TinyMCE enhanced textareas. Later I noticed on a PC that all the characters were taken out (because it couldn't read them), but on a MAC you could see little square boxes referencing the unicode number of that character. The reason it showed up in squares on a MAC in the first place, is because I used utf8_encode to encode data that wasn't in UTF to prevent other parsing errors (which is somehow also related to TinyMCE).
The solution to all this was quite simple:
I added this line entity_encoding : "utf-8" in my tinyMCE.init. Now, all the characters show up the way they are supposed to.
I guess the only thing I don't understand is why the characters still show up when placed in textboxes, because nothing converts them to UTF, but with TinyMCE it was a problem.
I agree that it is purely an encoding issue. In PHP, this is how I solved this problem:
Before passing the html-fragment to SimpleXMLElement constructor I decoded it by using html_entity_decode.
Then further encoded it using utf8_encode().
$headerDoc = '<temp>' . utf8_encode(html_entity_decode($headerFragment)) . '</temp>';
$xmlHeader = new SimpleXMLElement($headerDoc);
Now the above code does not throw any undefined entity errors.
You could HTML-parse the text and have it re-escaped with the respective numeric entities only (like: → ). In any case — simply using un-sanitized user input is a bad idea.
All of the numeric entities are allowed in XML, only the named ones known from HTML do not work (with the exception of &, ", <, >, ').
Most of the time though, you can just write the actual character (ö → ö) to the XML file so there is no need to use an entity reference at all. If you are using a DOM API to manipulate your XML (and you should!) this is your safest bet.
Finally (this is the lazy developer solution) you could build a broken XML file (i.e. not well-formed, with entity errors) and just pass it through tidy for the necessary fix-ups. This may work or may fail depending on just how broken the whole thing is. In my experience, tidy is pretty smart, though, and lets you get away with a lot.
1. I can find and replace all [ ?] and swap them out with [ ?] or an actual space.
This is a robust method, but it requires you to have a table of all the HTML entities (I assume the pasted input is coming from HTML) and to parse the pasted text for entity references.
2. I can place the code in question within a CDATA section.
In other words disable parsing for the whole section? Then you would have to parse it some other way. Could work.
3. I can include these entities within the XML file.
You mean include the entity definitions? I think this is an easy and robust way, if you don't mind making the XML file quite a bit bigger. You could have an "included" file (find one on the web) which is an external entity, which you reference from the top of your main XML file.
One downside is that the XML parser you use has to be one that processes external entities (which not all parsers are required to do). And it must correctly resolve the (possibly relative) URL of the external entity to something accessible. This is not too bad but it may increase constraints on your processing tools.
4. You could forbid non-XML in the pasted content. Among other things, this would disallow entity references that are not predefined in XML (the 5 that Tomalak mentioned) or defined in the content itself. However this may violate the requirements of the application, if users need to be able to paste HTML in there.
5. You could parse the pasted content as HTML into a DOM tree by setting someDiv.innerHTML = thePastedContent;
In other words, create a div somewhere (probably display=none, except for debugging). Say you then have a javascript variable myDiv that holds this div element, and another variable myField that holds the element that is your input text field. Then in javascript you do
myDiv.innerHTML = myField.value;
which takes the unparsed text from myField, parses it into an HTML DOM tree, and sticks it into myDiv as HTML content.
Then you would use some browser-based method for serializing (= "de-parsing") the DOM tree back into XML. See for example this question. Then you send the result to the server as XML.
Whether you want to do this fix in the browser or on the server (as #Hannes suggested) will depend on the size of the data, how quick the response has to be, how beefy your server is, and whether you care about hackers sending not-well-formed XML on purpose.
Use "htmlentities()" with flag "ENT_XML1": htmlentities($value, ENT_XML1);
If you use "SimpleXMLElement" class:
$SimpleXMLElement->addChild($name, htmlentities($value, ENT_XML1));
If you want to convert all characters, this may help you (I wrote it a while back) :
http://www.lautr.com/convert-all-applicable-characters-to-numeric-entities-for-use-in-xml
function _convertAlphaEntitysToNumericEntitys($entity) {
return '&#'.ord(html_entity_decode($entity[0])).';';
}
$content = preg_replace_callback(
'/&([\w\d]+);/i',
'_convertAlphaEntitysToNumericEntitys',
$content);
function _convertAsciOver127toNumericEntitys($entity) {
if(($asciCode = ord($entity[0])) > 127)
return '&#'.$asciCode.';';
else
return $entity[0];
}
$content = preg_replace_callback(
'/[^\w\d ]/i',
'_convertAsciOver127toNumericEntitys', $content);
This question is a general problem for any language that parses XML or JSON (so, basically, every language).
The above answers are for PHP, but a Perl solution would be as easy as...
my $excluderegex =
'^\n\x20-\x20' . # Don't Encode Spaces
'\x30-\x39' . # Don't Encode Numbers
'\x41-\x5a' . # Don't Encode Capitalized Letters
'\x61-\x7a' ; # Don't Encode Lowercase Letters
# in case anything is already encoded
$value = HTML::Entities::decode_entities($value);
# encode properly to numeric
$value = HTML::Entities::encode_numeric($value, $excluderegex);
I have made one form in which there is rich text editor. and i m trying to store the data to database.
now i have mainly two problem..
1) As soon as the string which contents "#"(basically when i try to change the color of the font) character, then it does not store characters after "#". and it also not store "#" character also.
2) although i had tried....in javascript
html.replace("\"","'");
but it does not replace the double quotes to single quotes.
We'll need to see some code. My feeling is you're missing some essential escaping step somewhere. In particular:
As soon as the string which contents "#"(basically when i try to change the color of the font) character
Implies to me that you might be sticking strings together into a URL like this:
var url= '/something.php?content='+html;
Naturally if the html contains a # symbol, you've got problems, because in:
http://www.example.com/something.php?content=<div style="color:#123456">
the # begins a fragment identifier called #123456">, like when you put #section on the end of a URL to go to the anchor called section in the HTML file. Fragment identifiers are purely client-side and are not sent to the server, which would see:
http://www.example.com/something.php?content=<div style="color:
However this is far from the only problem with the above. Space, < and = are simly invalid in URLs, and other characters like & will also mess up parameter parsing. To encode an arbitrary string into a query parameter you must use encodeURIComponent:
var url= '/something.php?content='+encodeURIComponent(html);
which will replace # with %35 and similarly for the other out-of-band characters.
However if this is indeed what you're doing, you should in any case you should not be storing anything to the database in response to a GET request, nor relying on a GET to pass potentially-large content. Use a POST request instead.
It seems that you are doing something very strange with your database code. Can you show the actual code you use for storing the string to database?
# - character is a common way to create a comment. That is everything starting from # to end of line is discarded. However if your code to store to database is correct, that should not matter.
Javascript is not the correct place to handle quote character conversions. The right place for that is on server side.
As you have requested....
I try to replay you... I try to mention exact what I had done...
1) on the client side on the html form page I had written like this..
html = html.trim(); // in html, the data of the rich text editor will come.
document.RTEDemo.action = "submit.php?method='"+ html.replace("\"","'") + "'";
\\ i had done replace bcz i think that was some problem with double quotes.
now on submit.php , my browser url is like this...
http://localhost/nc/submit.php?method='This is very simple recipe.<br><strong style='background-color: #111111; color: #80ff00; font-size: 20px;">To make Bread Buttor you will need</strong><br><br><blockquote><ol><li>bread</li><li>buttor</li></ol></li></blockquote><span style="background-color: #00ff80;">GOOD.</span><br><br><br><blockquote><br></blockquote><br>'
2) on submit.php ........I just write simply this
echo "METHOD : ".$_GET['method'] . "<br><br>";
$method = $_GET['method'];
now my answer of upper part is like this...
METHOD : 'This is very simple recipe.
now i want to store the full detail of URL....but its only storing...
This is very simple recipe.