I'm passing user-generated HTML into a database and I'm trying to make sure that no malicious code is passed through. One of the steps I'm taking is to run passed code through pear's HTML_Safe class to remove vulnerable markup. However, one thing I've noticed is that the name attribute of submitted elements gets removed. Sure enough, when you look at the source code, name is one of the few attributes that's blacklisted by default:
http://pear.php.net/package/HTML_Safe/docs/latest/HTML_Safe/HTML_Safe.html#var$attributes
What's the danger in allowing users to pass values for name? How can values for name be used to nefarious ends? Any thoughts? If not, I'm tempted to modify the blacklist.
In HTML form elements, the name attribute is used as an identifier. Therefore, if you allow name then someone may be able to override your HTML name attributes (that you may have used) with one of their own. The first matching name found is often the one used by either Javascript or server side processing.
This would then allow someone to exploit any possible Javascript or server side form processing you may be using that references the first matching name attribute found.
It is not just form elements that can use name, but they would be the least safe ones.
Another override issue is if you are using Javascripts getElementsByName in any of your functions (as pointed out below), you could end up with a function that does not do what you expect.
Edit: Some corrections and a note about getElementsByName issue (as pointed out below).
Related
Can we somehow pass the type HTML input attribute value to the $_POST array or grab it anyhow else with PHP?
I am aware that I can create a hidden field and basically put the type of the real input into the value of the hidden field, but this seems a bit like "repeating" work to me.
I want to create a Form, where input values are submitted to the $_POST and I can detect the type of that input without the need to hardcode/map the single inputs to each a type.
In this way I could detect the field type and act upon without the need to create a "map" that maps my custom inputs (by name or ID) to a certain type, which I already declare in HTML form anyway.
It seems a real shortcoming that the type of an input is undetectable in a Form Submit - or perhaps (hopefully) I miss something?
Can we somehow pass the type HTML input attribute value to the $_POST array or grab it anyhow else with PHP?
Not per se.
I am aware that I can create a hidden field and basically put the type of the real input into the value of the hidden field
That is a way to do it.
It seems a real shortcoming that the type of an input is undetectable in a Form Submit
Usually you know what type of data you expect for a given field because you aren't processing them generically, so it would rarely be a useful feature.
perhaps (hopefully) I miss something?
No.
Well here is the breakdown;
GET accessed via $_GET in PHP tackling and POST accessed via $_POST in PHP are transport methods, so is PUT, and DELETE etc for a from it does not matter what method you use it only works on client side and only knows to map every thing in it into serialised query string or at least have it read for being serialised.
For example
<input type="text" id="firstname" name="fname">
it takes the name attribute and converts into this
?fname=ferret
See it didn't even bother with ID attribute. When we hit submit button form will only run through name attributes of each input and make LHS of the with value and add user input as RHS to the value. It will not do anything else at all.
On PHP side we ask $_GET tunnel are there any query strings in the request or $_POST tunnel. Each of these if there is any query string - emphasis on word string. explodes the string into array and gives it you. hence $POST['fname'].
Looks something like this
$_POST = [
fname => 'ferret',
someothingelse => 'someothervalue']
SO what you are trying to do is or at least asking to do is ...make browser change its BOM behaviour - which we cannot in real sense of the matter; to make form add some thing like this.
?fname=ferret,text
?fname=ferret-text
?fname=ferret/text
form by default will not do this, unless you run custom function updating each query before submit and that is pron to what we call escaping, 3/100 time you would miss it given the chance
Then on PHP side you want PHP to figure out on its own that after slash is type like so
$_POST = [
fname => 'ferret/text']
PHP would not do that on its own, unless you fork it make custom whatever like Facebook has done and then run it or at least make some kind of low level library but that too would be after the fact.
in case your not wondering, thats how XSS and injections happen.
SO query string standards are rigid to keep things a string with militaristic data and serialised.
So yes what you intended to do with hidden field is one tested way of achieving what you are want.
I am using jQuery and I am just wondering, does ID have to be always unique in the whole page? Class, I know, can be repeated as many times as you like, what about ID?
Yes, it must be unique.
HTML4:
https://www.w3.org/TR/html4/struct/global.html#h-7.5.2
Section 7.5.2:
id = name [CS]
This attribute assigns a name to an element. This name must be unique in a document.
HTML5:
https://www.w3.org/TR/html5/dom.html#element-attrdef-global-id
The id attribute specifies its element's unique identifier (ID). The
value must be unique amongst all the IDs in the element's home subtree
and must contain at least one character. The value must not contain
any space characters.
Does an ID have to be unique in the whole page?
No.
Because the HTML Living Standard of March 15, 2022, clearly states:
The class, id, and slot attributes may be specified on all HTML elements. …….
When specified on HTML elements, the id attribute value must be unique amongst all the IDs in the element’s tree and must contain at least one character. The value must not contain any ASCII whitespace.
and a page may have several DOM trees. It does, for example, when you’ve attached (Element.attachShadow()) a shadow DOM tree to an element.
TL; DR
Does an ID have to be unique in the whole page?
No.
Does an ID have to be unique in a DOM tree?
Yes.
from mdn
https://developer.mozilla.org/en/DOM/element.id
so i guess it better be...
Technically, by HTML5 standards ID must be unique on the page - https://developer.mozilla.org/en/DOM/element.id
But I've worked on extremely modular websites, where this is completely ignored and it works. And it makes sense - the most surprising part.
We call it "componentization"
For example, you might have a component on your page, which is some kind of widget. It has stuff inside with their own unique IDs eg "ok-button"
Once there are many of these widgets on the page, you technically have invalid HTML. But it makes perfect sense to componentize your widgets so that they each, internally, reference their own ok button eg if using jquery to search from it's own root it might be: $widgetRoot.find("#ok-button")
This works for us, though technically IDs shouldn't be used at all, once they're not unique.
As cited above, even YouTube does it, so it's not so renegade.
Jan 2018, here is Youtube HTML , you can see id="button" id="info" are duplicated.
That's basically the whole point of an ID. :) IDs are specific, can only be used once per page. Classes can be used as pleased.
Browsers used to be lenient on this (many years ago when css was young) and allow the ID to be used more than once. They have become more strict.
However, yes ID's are to be unique and only used once.
If you need to use css formatting more than once use CLASS.
With Javascript, you can only reference to one element using ID. document.getElementById and jQuery's $ selector will return only the first element matching. So it doesn't make sense using the same ID on multiple elements.
There are great answers for the same question at https://softwareengineering.stackexchange.com/questions/127178/two-html-elements-with-same-id-attribute-how-bad-is-it-really.
One tidbit not mentioned above is that if there are several identical ids one the same page (which happens, even though it violates the standard):
If you have to work around this (that's sad), you can use $("*#foo") which will convince jQuery to use getElementsByTagName and return a list of all matched elements.
Yes, IDs are unique. Class are not.
IDs always have to be unique.
Everybody has a unique identification number (ex. Social Security number), and there are lots of people in a social class
I'm adding to this question, because I feel it has not been answered adequately in any of the above,
As a point reference: I've implemented non-unique id's, and all works just fine (across all browsers). Importantly, when coding, I've not run into any css logic errors, which is where the rubber hits the road (IMO) on this question. Have also not run into any conflicts in js (as one can glean out id's in context with classes)
So, why do id's have to be unique? Obvious answer (as stated and re-stated above) is 'cause the 'standards' say so. The missing part for me is why?
i.e. what actually goes awry (or could theoretically go awry) if (heaven forbid) someone inadvertently used the same id twice?
The reference with all browsers these days?
Makes div possible in such terms of being used multiple times.
There is no rule that it must be unique. When all browsers understand:
<script>div#some {font-size: 20px}</script>
<div id="some"><p>understand</p></div>
<div id="some"><h1>me too</h1></div>
When you add new style CSS codes you have the possibility to use the addition of styles. Since that even is not supposed to be unique it describes the opposite use, so make more styles but do not make more objects? And as you can; assign several div objects, so why didn't they tell you that class must be unique? That's because the class does not need unique value. And that makes the ID in legal terms obsolete if not being unique.
<script>.some {font-size: 25px}</script>
<div class="some"><p>enter</p></div>
<div class="some"><h1>enter</p></div>
"When there is no rule when a rule is said. Then the rule is not fulfilled. It's not inherent to exist. By only in the illusion of all rules that it should have existed only to make life much harder."
Just because some people say div must be unique, this might be right, but at least through their professional perspective to say it, they have to say it. Unless they didn't check the modern browsers, which from nearly the beginning were able to understand the code of several different div objects with the same style.
ID must be unique - One reason for that is, that in the Browser-JavaScript-Context exists a methode: Document.getElementById()
This method can only return one element.
If a Document has not unique IDs, this function behaves in an undocumented and unforeseeable way.
I think, this is reason enough to only use one ID per Document.
Reference:
https://developer.mozilla.org/en-US/docs/Web/API/Document/getElementById
I am using jQuery and I am just wondering, does ID have to be always unique in the whole page? Class, I know, can be repeated as many times as you like, what about ID?
Yes, it must be unique.
HTML4:
https://www.w3.org/TR/html4/struct/global.html#h-7.5.2
Section 7.5.2:
id = name [CS]
This attribute assigns a name to an element. This name must be unique in a document.
HTML5:
https://www.w3.org/TR/html5/dom.html#element-attrdef-global-id
The id attribute specifies its element's unique identifier (ID). The
value must be unique amongst all the IDs in the element's home subtree
and must contain at least one character. The value must not contain
any space characters.
Does an ID have to be unique in the whole page?
No.
Because the HTML Living Standard of March 15, 2022, clearly states:
The class, id, and slot attributes may be specified on all HTML elements. …….
When specified on HTML elements, the id attribute value must be unique amongst all the IDs in the element’s tree and must contain at least one character. The value must not contain any ASCII whitespace.
and a page may have several DOM trees. It does, for example, when you’ve attached (Element.attachShadow()) a shadow DOM tree to an element.
TL; DR
Does an ID have to be unique in the whole page?
No.
Does an ID have to be unique in a DOM tree?
Yes.
from mdn
https://developer.mozilla.org/en/DOM/element.id
so i guess it better be...
Technically, by HTML5 standards ID must be unique on the page - https://developer.mozilla.org/en/DOM/element.id
But I've worked on extremely modular websites, where this is completely ignored and it works. And it makes sense - the most surprising part.
We call it "componentization"
For example, you might have a component on your page, which is some kind of widget. It has stuff inside with their own unique IDs eg "ok-button"
Once there are many of these widgets on the page, you technically have invalid HTML. But it makes perfect sense to componentize your widgets so that they each, internally, reference their own ok button eg if using jquery to search from it's own root it might be: $widgetRoot.find("#ok-button")
This works for us, though technically IDs shouldn't be used at all, once they're not unique.
As cited above, even YouTube does it, so it's not so renegade.
Jan 2018, here is Youtube HTML , you can see id="button" id="info" are duplicated.
That's basically the whole point of an ID. :) IDs are specific, can only be used once per page. Classes can be used as pleased.
Browsers used to be lenient on this (many years ago when css was young) and allow the ID to be used more than once. They have become more strict.
However, yes ID's are to be unique and only used once.
If you need to use css formatting more than once use CLASS.
With Javascript, you can only reference to one element using ID. document.getElementById and jQuery's $ selector will return only the first element matching. So it doesn't make sense using the same ID on multiple elements.
There are great answers for the same question at https://softwareengineering.stackexchange.com/questions/127178/two-html-elements-with-same-id-attribute-how-bad-is-it-really.
One tidbit not mentioned above is that if there are several identical ids one the same page (which happens, even though it violates the standard):
If you have to work around this (that's sad), you can use $("*#foo") which will convince jQuery to use getElementsByTagName and return a list of all matched elements.
Yes, IDs are unique. Class are not.
IDs always have to be unique.
Everybody has a unique identification number (ex. Social Security number), and there are lots of people in a social class
I'm adding to this question, because I feel it has not been answered adequately in any of the above,
As a point reference: I've implemented non-unique id's, and all works just fine (across all browsers). Importantly, when coding, I've not run into any css logic errors, which is where the rubber hits the road (IMO) on this question. Have also not run into any conflicts in js (as one can glean out id's in context with classes)
So, why do id's have to be unique? Obvious answer (as stated and re-stated above) is 'cause the 'standards' say so. The missing part for me is why?
i.e. what actually goes awry (or could theoretically go awry) if (heaven forbid) someone inadvertently used the same id twice?
The reference with all browsers these days?
Makes div possible in such terms of being used multiple times.
There is no rule that it must be unique. When all browsers understand:
<script>div#some {font-size: 20px}</script>
<div id="some"><p>understand</p></div>
<div id="some"><h1>me too</h1></div>
When you add new style CSS codes you have the possibility to use the addition of styles. Since that even is not supposed to be unique it describes the opposite use, so make more styles but do not make more objects? And as you can; assign several div objects, so why didn't they tell you that class must be unique? That's because the class does not need unique value. And that makes the ID in legal terms obsolete if not being unique.
<script>.some {font-size: 25px}</script>
<div class="some"><p>enter</p></div>
<div class="some"><h1>enter</p></div>
"When there is no rule when a rule is said. Then the rule is not fulfilled. It's not inherent to exist. By only in the illusion of all rules that it should have existed only to make life much harder."
Just because some people say div must be unique, this might be right, but at least through their professional perspective to say it, they have to say it. Unless they didn't check the modern browsers, which from nearly the beginning were able to understand the code of several different div objects with the same style.
ID must be unique - One reason for that is, that in the Browser-JavaScript-Context exists a methode: Document.getElementById()
This method can only return one element.
If a Document has not unique IDs, this function behaves in an undocumented and unforeseeable way.
I think, this is reason enough to only use one ID per Document.
Reference:
https://developer.mozilla.org/en-US/docs/Web/API/Document/getElementById
I have a page with three different forms. The second has to have access to the post vars submitted by the
first. The third has to have the cumulative post vars.
Even though an element such as a hidden field has the same id as another form element, it should
be valid if it exists under a different form element, right? I have done this in the past without problem
as far as submission processing, but the xhtml doctype syntax checker in my text editor (BBedit on Mac OSX) marks
the re occurrence of an element id as an error.
To be completely valid with respect to doctype I have to use xhtml transitional to allow name attributes (forms
won't submit with out them)
I don't want to have three different sets of hidden fields to transmit the same values for each different form
That requires huge amounts of redundant processing on the server side.
Thanks for reminding me that I can use the same name attribute and different ids. Sometimes I get wrapped up
in details and loose sight of the bigger picture
By the way, I posted a problem with using one form for the entire setup at:
https://stackoverflow.com/questions/21315920/browser-caching-post-vars
and I have not received any definitive answer there.
The id attribute MUST be unique per document. However, if you simply want various fields to accessible using the same key server-side, simply set the name attribute. name has no such requirement and can differ from the id.
In my project I have submissions and comments, each with an ID. Currently the ID's are just numeric and correspond to their database ID's. Everything is working fine but when I run it through the W3 validator I get the error:
value of attribute "id" invalid: "1" cannot start a name
I suppose instead that I could just precede all ids with some sort of string but then whenever I was using or manipulating the id in JQuery or PHP I have to do a id.replace('string', '') before using it. This seems rather cumbersome. Any advice?
Yes, using numbers as HTML element IDs is bad practice.
It violates W3C specification.
You noted this in your question, and it is true for every HTML specification except HTML5
It is detrimental to SEO.
Search-engine optimized HTML element IDs SHOULD reflect the content of the identified element. Check out How To Compose HTML ID and Class Names like a Rockstar by Meitar Moscovitz. It provides a good overview of this concept.
There can be server-side scripting issues.
Back when I first started programming in ASP classic, I had to access submitted form fields by a syntax like Request.Form("some_id"). However, if I did Request.Form(1) it would return the value of the second field in the form collection, instead of the element with an Id equal to 1. This is a pretty standard behavior for working with collections. Its also similar with javascript, and could make your client side scripting more complicated to maintain as well.
I suggest you to use prefixes "comment-ID" or "post-ID".
If you need the id in JavaScript, you just have to id.substring(8) (for "comment-")
The HTML 5 Specification lifts this restriction. If you're worried about validity you might simply consider changing the DTD to HTML5's.
http://www.w3.org/TR/html5/elements.html#the-id-attribute
If you're manipulating the element then you can just use $(this).jQueryOperation() - therefore you can have a prefix without having to replace anything!
The best way for your need is having the prefix for your class, I mean something like item-x and x is the number that you need.
But from my personal experience, it is better to use classes for your elements, and you know that you must use classes if the item is not unique in the page