I have a site that allows users to create a page based on user input example.com/My Page
The problem is if they create a url like example.com/H & E Photos or example.com/#1 Fan Club
Once php decodes the url, it tries to parse those characters into a hash (or a query string in the case of ?)
In my .htacess I am doing this ([^/]+?)
What is the typical way of handling a situation like this? Ideally, without going to an id system (example.com/131234121). Poor planning on my part :(
EDIT. Talking about PHP here. url is encoded when it hits the server, php decodes before parse regex and url
If you are using PHP to create/handle storing entries for user-entered-URLs then use htmlentities on the string before trying to handle it.
https://www.php.net/manual/en/function.htmlentities.php
https://www.w3schools.com/php/func_string_htmlentities.asp
Apparently, what I was looking for was a rewrite flag.
http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriteflags
B Escape non-alphanumeric characters before applying the transformation.
This allows you to send percent-encoded strings to the URL without them being decoded beforehand.
So it was actually an apache thing and not PHP. Sorry for the misleading question.
Related
I recently started looking at adding untrusted usernames in prettied urls, eg:
mysite.com/
mysite.com/user/sarah
mysite.com/user/sarah/article/my-home-in-brugge
mysite.com/user/sarah/settings
etc..
Note the username 'sarah' and the article name 'my-home-in-brugge'.
What I would like to achieve, is that someone could just copy-paste the following url somewhere:
(1)
mysite.com/user/Björk Guðmundsdóttir/articles
mysite.com/user/毛泽东/posts
...and it would just be very clear, before clicking on the link, what to expect to see. The following two exact same urls, where the usernames have been encoded using PHP rawurlencode() (considered the proper way of doing this):
(2)
mysite.com/user/Bj%C3%B6rk%20Gu%C3%B0mundsd%C3%B3ttir/articles
mysite.com/user/%E6%AF%9B%E6%B3%BD%E4%B8%9C/posts
...are a lot less clear.
There are three ways to securely (to some level of guarantee) pass an untrusted name containing readable utf8 characters into a url path as a directory:
A. You reparse the string into allowable characters whilst still keeping it uniquely associated in your database to that user, eg:
(3)
mysite.com/user/bjork-guomundsdottir/articles
mysite.com/user/mao-ze-dong12/posts
B. You limit the user's input at string creation time to acceptable characters for url passing (you ask eg. for alphanumeric characters only):
(4)
mysite.com/user/bjorkguomundsdottir/articles
mysite.com/user/maozedong12/posts
using eg. a regex check (for simplicity sake)
if(!preg_match('/^[\p{L}\p{N}\p{P}\p{Zs}\p{Sm}\p{Sc}]+$/u', trim($sUserInput))) {
//...
}
C. You escape them in full using PHP rawurlencode(), and get the ugly output as in (2).
Question:
I want to focus on B, and push this as far as is possible within KNOWN errors/concerns, until we get the beautiful urls as in (1). I found out that passing many unicode characters in urls is possible in modern browsers. Modern browsers automatically convert unicode characters or non-url parseable characters into encoded characters, allowing the user to Eg. Copy paste the nice-looking unicode urls as in (1), and the browser will get the actual final url right.
For some characters, the browser will not get it right without encoding: Eg. ?, #, / or \ will definitely and clearly break the url.
So: Which characters in the (non-alphanumeric) ascii range can we allow at creation time, accross the entire unicode spectrum, to be injected into a url without escaping? Or better: Which groups of Unicode characters can we allow? Which characters are definitely always blacklisted ? There will be special cases: Spaces look fine, except at the end of the string, otherwise they could be mis-selected. Is there a reference out there, that shows which browsers interprete which unicode character ranges ok?
PS: I am very well aware that using improperly encoded strings in urls will almost never provide a security guarantee. This question is certainly not recommended practice, but I do not see the difference of asking this question, and the done-so-often matter of copy-pasting a url from a website and pasting it into the browser, without thinking it through whether that url was correctly encoded or not (the novice user wouldn't). Has someone looked at this before, and what was their code (regex, conditions, if-statement..) solution?
Suppose I do
http://site.com/something?url=http://lol.com/lol
are there any advantages (eg security etc) of doing
'http://site.com/something?url=' . urlencode('http://lol.com/lol');
instead of just passing in an unencoded version of the url in? Why should I urlencode something passed via GET instead of just passing in an uncoded version (though of course if the url param has & or ? or = in it then I should definitely encode it...but suppose they don't, why should I encode them)
There are characters which have special meaning in URLs. Those characters will have special meaning in the outer URL (instead of the inner URL where they belong) if you pass them without encoding.
For example, if you want to pass
http://example.com/foo?1=2&3=4
And you don't encode it, then you will get:
http://example.com/?url=http://example.com/foo?1=2&3=4
with
url is http://example.com/foo?1=2
3 is 4
suppose they don't, why should I encode them
Because then you have to look at every URL you pass to decide if it needs encoding or not.
Always encoding is much simpler and less error prone then deciding on a case by case basis.
as long as i know the second sample is pretty good if you have the probalability to use unicode characters in your url.
It's quite pleasure to be posting my first question in here :-)
I'm running a URL Shortening / Redirecting service, PHP written.
I aim to store and handle valid URLs data as much as possible within my service.
I noticed that sometimes, invalid URL data is being handled over to the database, holding invalid characters (like spaces in the end or beginning of the URL).
I decided to make my URL-Check mechanism trim, stripslashes and strip_tags the values before storing them.
As far as I can think, these functions will not remove valid charterers that any URL may have.
Kindly, just correct me or advise me if I'm going into the wrong direction.
Regards..
If you're already trimming the incoming variable, as well as filtering it with the other built in PHP methods, and STILL running into issues, try changing the collation of your table to UTF-8 and see if that helps you get rid of the special characters you mention. (Could you paste a few examples to let us know?)
if i trying to access this url http://localhost/common/news/33/+%E0%B0%95%E0%B1%87%E0%B0%B8.html , it shows an An Error Was Encountered, The URI you submitted has disallowed characters. I set $config['permitted_uri_chars'] = 'a-z 0-9~%.:??_=+-?' ; ..// WHat i do ?
Yeah, if you want to allow non-ASCII bytes you would have to add them to permitted_uri_chars. This feature operates on URL-decoded strings (normally, unless there is something unusual about the environment), so you have to put the verbatim bytes you want in the string and not merely % and the hex digits. (Yes, I said bytes: _filter_uri doesn't use Unicode regex, so you can't use a Unicode range.)
Trying to filter incoming values (instead of encoding outgoing ones) is a ludicrously basic error that it is depressing to find in a popular framework. You can turn this misguided feature off by setting permitted_uri_chars to an empty string, or maybe you would like a range of all bytes except for control codes ("\x20-\xFF"). Unfortunately the _filter_uri function still does crazy, crazy, broken things with some input, HTML-encoding some punctuation on the way in for some unknown bizarre reason. And you don't get to turn this off.
This, along with the broken “anti-XSS” mangler, makes me believe the CodeIgniter team have quite a poor understanding of how string escaping and security issues actually work. I would not trust anything they say on security ever.
What to do?
Stop using unicode characters in an URL - for the same reasons as you shouldn't name files on a filesystem with unicode characters.
But, if you really need it, I'll copy/paste some lines from the config:
Leave blank to allow all characters -- but only if you are insane.
I would NOT suggest trying to decode them or use any other tricks, instead I would suggest using urlencode() and urldecode() functions.
Since I don't have a copy of your code, I can't add examples, if you could provide me some, I can show you an example how to do it.
However, it's pretty straightforward to use, and it's built in PHP4 and PHP5.
I had a similar problem and wanted to share the solution. It was reset password, and I had to send the username and time, as the url will be active for an hour only. Codeigniter will not accept certain characters in url for security reasons and I did not want to change that. So here is what I did:
concat user name, '__' and time() in a var $str
encrypt $str using MCRYPT_BLOWFISH, this may contain '/', '+'
re-encrypt using str2hex (got it from here)
put the encoded string as the 3rd argument in the link sent by
email, like,
http://xyz.com/users/resetpassword/3123213213ABCDEF238746238469898
-you can see that the url contains only 0-9 and A-Z.
When link from email is clicked, get the 3rd uri segment, use
hex2str() to decrypt to blowfish encrypted string, and then apply
blowfish decrypt to get the original string.
split with '__' to get the user name and time
I know that its almost a year till this question was asked, but I am hoping that someone will find this solution helpful after coming here by google.
How can I post a full URL in PHP?
For example:
I have a form allowing individuals to submit a long url. The resultant page is /index.php?url=http://www.example.com/
This is fine for short URLs, but for very long and complex URLs (like those from Google Maps) I need to know how to keep all of the data associated with variable url.
You need to percent encode the string — otherwise characters which have special meaning in URIs will have that special meaning instead of being treated as data.
http://php.net/urlencode
If users submit this data via a form, then it will be automatically encoded.
If you plan to include the URI in a link in an HTML document, then don't forget to convert special characters to HTML entities.
You sort of answer your own question:
How can I post a full URL in PHP?
If very long URLs are getting truncated by the users' browsers, your only option is to re-work your system to POST the URL to your script, as opposed to passing it in the query string.
If there is some condition that frustrates the use of a POST request, you should update your question with more detail about what your system does.