I'm working on an Android app which sends some hashes to the server. Of course, the hashes often contain special characters like + = /. I found out that my PHP script is automatically changing the + symbol with a blank space, which somehow breaches in my own security mechanism.
I could've simply replaced the blank space with the + sign using the str_replace() function, but I'm worried if there can be more circumstances like this where PHP changes some special characters with some other characters. Also, It's not the ethical way.
1) Is it only about the + symbol or there can be other occurrences too?
2) What is the correct way to get the raw (unformatted) string?
As mentioned in my comment this is an encoding issue and should be fixed in the Request (Android app) rather than the server. Check this answer for a Java example.
To your specific questions:
There are a number of "control characters" that NEED to be encoded because they have a special meaning: + ? &
I don't think it's possible to get the original string in PHP, as PHP likely never sees it. The webserver will likely URL-decode the params before passing it to PHP, as defined in URL/HTTP.
Related
So AWS converts a space into + for the bucket/file URL. But a filename that already has + in it is encoded as %2B. I am confused how to handle this case.
When the input URL for an application is :
https://s3-us-west-2.amazonaws.com/mybucket/Pul0419_32_a+b.zip
how do I decide whether the file that actually exists is Pul0419_32_a+b.zip or Pul0419_32_a b.zip
AWS enthusiast that I am, I have to concede that the original architects of S3 made an extremely unfortunate error when they decided that + in the path of a URL should be interpreted as if it were equivalent to ASCII 0x20 ("space").
The + character only carries this meaning when part of the query string. In the path, it should have been interpreted literally.
In the path of a correctly encoded and interpreted URL, + is equivalent to %2B.
There is, then, no dependable answer to the question, because of the fundamental flaw that causes S3 to handle correct URLs incorrectly.
Given the fact that if the example URL were used by a browser, S3 would assume those were spaces, your interests would probably be best served by not transforming the URL to use %2B but rather to use it as-is in the interaction with S3... unless practical experience suggests that the original source of these URLs has actually interacted with S3 and did indeed transform them to %2B without storing them for subsequent use with consistent encoding, in which case the argument could be made that they are being provided to you wrong but you may have to transform them anyway, for reasons that may be more political than technical.
But, as it appears you already suspect, the answer is less than straightforward.
This is a little out of the blue and it's mostly curiosity. I hope it's not a waste pf time and space.
I was writing a little script to validate accounts with a link so I decided to send an email with a link to the php script and in the link I would put two variables to get with the _GET array. A key and the email. Then I would just search the database with that email and key and change it's activated status to true... No prob. Easy enough even though it may not be very elegant..
I used a script for the generation of the key that I used elsewhere in the site for generating a new password (to reset it for instance) but sometimes it didn't work and after a lot of tries I noticed (and I felt stupid then) that the array my password generation function drew from was this:
'0123456789_!##$%&*()-=+abcdfghjkmnpqrstvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
So naturally I deleted the & character that is used for separating variables in the url... Then in another try I noticed that the link in the email was not recognized whole and stopped after the '#' character as well which I then remembered is used for references in an html so I deleted that as well. In the end I decided to leave only alphanumeric characters to be sure but I am curious; Are ther any more characters that are not 'valid' for url's using utilizing _GET and is there any way to use those characters anyway (maybe ulr encode or somwething)
There are plenty of characters that are invalid. Use urlencode to convert them to URL safe encodings. (Always run that function over any data you are inserting into a URL).
You have to use urlencode() before sending the values to $_GET.
You could use url_encode and url_decode but I would stay away from & # ? these are normal URL characters.
Also when it comes to passwords : dont stress about an algorithm, use sha1 crypt or something along those lines with a salt. These algorithms will be much stronger than your homemade ones.
I have a web app where I first store JSON data in a cookie, then save to the database every x seconds. It just opens a connection to the server, and the server reads the cookie. It doesn't actually send anything via POST or GET.
While I save to the cookie, my data is formatted fine. However, when I work with it in PHP then setcookie a new json_encoded array, it replaces spaces with + symbols, and then these show up in my web app. I can't find any way to disable encoding of strings for json_encode, nor a JS way of parsing those plus symbols out (using jQuery.parseJSON; stringify's parse didn't work either)... Does anyone have any idea :S?
From the fine manual:
Note that the value portion of the cookie will automatically be urlencoded when you send the cookie, and when it is received, it is automatically decoded and assigned to a variable by the same name as the cookie name. If you don't want this, you can use setrawcookie() instead if you are using PHP 5.
But I think you still want the cookie URL encoded, you just want %20 for spaces instead of +. However, urlencode:
[...] for historical reasons, spaces are encoded as plus (+) signs
You could try using rawurlencode to encode it yourself:
Returns a string in which all non-alphanumeric characters except -_.~ have been replaced with a percent (%) sign followed by two hex digits. This is the encoding described in RFC 3986 [...]
And then setrawcookie to set the cookie. Unfortunately, none of decodeURI, decodeURIComponent, or even the deprecated unescape JavaScript functions will convert a + back to a space; so, you're probably stuck forcing everyone to make sense the hard way.
if i trying to access this url http://localhost/common/news/33/+%E0%B0%95%E0%B1%87%E0%B0%B8.html , it shows an An Error Was Encountered, The URI you submitted has disallowed characters. I set $config['permitted_uri_chars'] = 'a-z 0-9~%.:??_=+-?' ; ..// WHat i do ?
Yeah, if you want to allow non-ASCII bytes you would have to add them to permitted_uri_chars. This feature operates on URL-decoded strings (normally, unless there is something unusual about the environment), so you have to put the verbatim bytes you want in the string and not merely % and the hex digits. (Yes, I said bytes: _filter_uri doesn't use Unicode regex, so you can't use a Unicode range.)
Trying to filter incoming values (instead of encoding outgoing ones) is a ludicrously basic error that it is depressing to find in a popular framework. You can turn this misguided feature off by setting permitted_uri_chars to an empty string, or maybe you would like a range of all bytes except for control codes ("\x20-\xFF"). Unfortunately the _filter_uri function still does crazy, crazy, broken things with some input, HTML-encoding some punctuation on the way in for some unknown bizarre reason. And you don't get to turn this off.
This, along with the broken “anti-XSS” mangler, makes me believe the CodeIgniter team have quite a poor understanding of how string escaping and security issues actually work. I would not trust anything they say on security ever.
What to do?
Stop using unicode characters in an URL - for the same reasons as you shouldn't name files on a filesystem with unicode characters.
But, if you really need it, I'll copy/paste some lines from the config:
Leave blank to allow all characters -- but only if you are insane.
I would NOT suggest trying to decode them or use any other tricks, instead I would suggest using urlencode() and urldecode() functions.
Since I don't have a copy of your code, I can't add examples, if you could provide me some, I can show you an example how to do it.
However, it's pretty straightforward to use, and it's built in PHP4 and PHP5.
I had a similar problem and wanted to share the solution. It was reset password, and I had to send the username and time, as the url will be active for an hour only. Codeigniter will not accept certain characters in url for security reasons and I did not want to change that. So here is what I did:
concat user name, '__' and time() in a var $str
encrypt $str using MCRYPT_BLOWFISH, this may contain '/', '+'
re-encrypt using str2hex (got it from here)
put the encoded string as the 3rd argument in the link sent by
email, like,
http://xyz.com/users/resetpassword/3123213213ABCDEF238746238469898
-you can see that the url contains only 0-9 and A-Z.
When link from email is clicked, get the 3rd uri segment, use
hex2str() to decrypt to blowfish encrypted string, and then apply
blowfish decrypt to get the original string.
split with '__' to get the user name and time
I know that its almost a year till this question was asked, but I am hoping that someone will find this solution helpful after coming here by google.
I am working on a Flex app that has a MySQL database. Data is retrieved from the DB using PHP then I am using AMFPHP to pass the data on to Flex
The problem that I am having is that the data is being copied from Word documents which sometimes result in some of the more unusual characters are not displaying properly. For example, Word uses different characters for starting and ending double quotes instead of just " (the standard double quotes). Another example is the long dash instead of -.
All of these characters result in one or more accented capital A characters appearing instead. Not only that, each time the document is saved, the characters are replaced again resulting in an ever-increasing number of these accented A's appearing.
Doing a search and replace for each troublesome character to swap it for one of the none characters seems to work but obviously this requires compiling a list of all the characters that may appear and means there is scope for this continuing as new characters are used for the first time. It also seems like a bit of a brute force way of getting round the problem rather than a proper solution.
Does anyone know what causes this and have any good workarounds / fixes? I have had similar problems when using utf-8 characters in html documents that aren't set to use utf-8. Is this the same thing and if so, how do I get flex to use utf-8?
Many thanks
Adam
It is the same thing, and smart quotes aren't special as such: you will in fact be failing for every non-ASCII character. As such a trivial ad-hoc replace for the smart quote characters will be pointless.
At some point, someone is mis-decoding a sequence of bytes as ISO-8859-1 or Windows code page 1252 when it should have been UTF-8. Difficult to say where without detail/code.
What is “the document”? What format is it? Does that format support UTF-8 content? If it does not, you will need to encode output you put into it at the document-creation phase to the encoding the consumer of that document expects, eg. using iconv.