I've been coding in PHP for a while, and this is the first time I came across this issue.
My goal is to pass a GET variable (a url) without encoding or decoding it. Which means that "%2F" will not turn to "/" and the opposite. The reason for that is that I'm passing this variable to a 3rd party website and the vairable must stay exactly the way it is.
Right now what's happening is that this url (passed as a GET variable):http://example.com/something%2Felse turns into http://example.com/something/else.
How can I prevent php from encoding what's passed in GET?
Apache denies all URLs with %2F in the path part, for security reasons: scripts can't normally (ie. without rewriting) tell the difference between %2F and / due to the PATH_INFO environment variable being automatically URL-decoded (which is stupid, but a long-standing part of the CGI specification so there's nothing can be done about it).
You can turn this feature off using the AllowEncodedSlashes directive, but note that other web servers will still disallow it (with no option to turn that off), and that other characters may also be taboo (eg. %5C), and that %00 in particular will always be blocked by both Apache and IIS. So if your application relied on being able to have %2F or other characters in a path part you'd be limiting your compatibility/deployment options.
I am using urlencode() while preparing the search URL
You should use rawurlencode(), not urlencode() for escaping path parts. urlencode() is misnamed, it is actually for application/x-www-form-urlencoded data such as in the query string or the body of a POST request, and not for other parts of the URL.
The difference is that + doesn't mean space in path parts. rawurlencode() will correctly produce %20 instead, which will work both in form-encoded data and other parts of the URL.
Hex base16 encoding it is part of the HTTP protocol you cant prevent it else it would break the actual HTTP socket request to the server.
Use:
urlencode() to encode
urldecode() to decode
Please show an actual example of how you are sending the url to the 3rd party.
As it should read http%3A%2F%2Fexample.com%2Fsomething%2Felse not just the odd %2F like in your example.
Related
I received an error report from my system because of a request that looked like this:
https://www.example.com./
Note the added period before the third forward-slash.
I would not imagine this to be valid though the server says the $_SERVER['HTTP_HOST'] = www.example.net..
Is this technically valid?
Should I be using trim with odd characters to redirect to the actual host name URLs?
Are there other odd ways that an $_SERVER['HTTP_HOST'] could be requested that I should try to have my system compensate for?
Yes, it's valid! check out https://stackoverflow.com./.
Technically I believe the URIs are identical, so I don't know there's a strong reason to redirect from one to the other. If it works, I don't think I would touch this. Note that stackoverflow for example does not.
The HTTP Host header is controlled by the client and could be any string. So if you're doing anything with that header, such as adding it to your HTML or a SQL string, you need to treat it like user input and escape. You should assume this for every header. It's always possible to do a request with CURL and change any of them.
I have to send a GET request to my Apache server. Whenever the parameters have values that are just one words, things work smoothly. Whenever, there are spaces, I am changing them to %20 and it does the trick
However, whenever I have slashes in my parameter values, things do not work.
For example, the URL I want to send to my server is:
https://randomness.com?path=/var/images/sub%20images/&name=image%2001.jpg
How can I get a workaround regarding this?
Many characters are specifically interpreted by the web host in URLs and the / character is one of them.
You can translate your / characters to %2F, like you translate to %20.
PHP's urlencode function can also handle these translations for you automatically.
A handy reference for these encodings can be found here,
should you wish to handle it manually.
I'm sanitizing USER_AGENT for logging in PHP and need to know whether to use substr() or mb_strcut().
Seeing how USER_AGENT is directly derived from the HTTP request header User-Agent, I'm going to assume you're interested in HTTP headers.
Is it possible that HTTP headers will contain bytes outside the 7-bit ASCII range? Yes.
Is it likely that you'll actually see this in practice and need to handle it properly? I'd say no.
Therefore I suggest a third option: first strip all non-ASCII characters from the string, then use regular multibyte-unsafe functions to your heart's content.
Suppose I do
http://site.com/something?url=http://lol.com/lol
are there any advantages (eg security etc) of doing
'http://site.com/something?url=' . urlencode('http://lol.com/lol');
instead of just passing in an unencoded version of the url in? Why should I urlencode something passed via GET instead of just passing in an uncoded version (though of course if the url param has & or ? or = in it then I should definitely encode it...but suppose they don't, why should I encode them)
There are characters which have special meaning in URLs. Those characters will have special meaning in the outer URL (instead of the inner URL where they belong) if you pass them without encoding.
For example, if you want to pass
http://example.com/foo?1=2&3=4
And you don't encode it, then you will get:
http://example.com/?url=http://example.com/foo?1=2&3=4
with
url is http://example.com/foo?1=2
3 is 4
suppose they don't, why should I encode them
Because then you have to look at every URL you pass to decide if it needs encoding or not.
Always encoding is much simpler and less error prone then deciding on a case by case basis.
as long as i know the second sample is pretty good if you have the probalability to use unicode characters in your url.
From what I have been able to understand, hash marks (#) aren't sent to the server, so it doesn't seem likely that I will be able to use raw PHP to parse data like in the URL below:
index.php?name=Ben&address=101 S 10th St Suite #301
I'm looking to pre-populate form fields with this $_GET data. How would I do this with Javascript (or jQuery), and is there a fallback that wouldn't break my form for people not using Javascript? Currently if there is a hash (usually in the address field), everything after that is not parsed in or stored in $_GET.
You can encode the hash as you should urlencode(in php) or encodeURIComponent(in JavaScript).
The "hash" is not part of the request, which is why your script never sees it.
Like webdestroya said, you'll need to send a request with the URL
index.php?name=Ben&address=101%20S%2010th%20St%20Suite%20%23301
If you're using HTML forms, then the string value will be auto-urlencoded when you submit the form.
the user will be clicking a link from an email and will want to see the hash mark rendered in the email
You need to encode the link to what Ben quoted before you stick it in the e-mail. What you currently have is not a URL at all.
You can optionally encode a space to + instead of %20 in the context of query parameters but you absolutely cannot include a raw space, because it is a defining characteristic of URLs that they don't have spaces in. If you type a space in a URL in a web browser it will quietly fix up the mistake, but an e-mail client can't pick out a URL from plain text if it's full of spaces.
There is sometimes an alternative function which encodes spaces to + instead of %20. Normally this is best avoided as + isn't valid in all circumstances, but if prefer:
index.php?name=Ben&address=101+S+10th+St+Suite+%23301
then you'd use PHP's urlencode function instead of the more standard rawurlencode.
Either way, you must encode the hash to %23, because otherwise a hash in an HTTP URL means the fragment identifier (the part of the page to scroll the browser to). This is not part of the address of the page itself; it is not even passed from the browser to the server, so you certainly cannot retrieve it—from $_GET or any other interface.
There are many other characters in a component like an address that must be %-encoded before being inserted into a URL string, or they'll leave you with an invalid or otherwise non-functional URL. If all that %23 business looks funny in a URL... well, you'll have to live with it. That's what URLs have always looked like.
I usually store the hash on a cookie onunload
ej:
window.unload = function(){
if(document.location.hash) setCoockie('myhash',document.location.hash);
};