We have an url like http://site.s3.amazonaws.com/images/some image #name.jpg inside $string
What I'm trying to do (yes, there is a whitespace around the url):
$string = urlencode(trim($string));
$string_data = file_get_contents($string);
What I get (# is also replaced):
file_get_contents(http%3A%2F%2Fsite.s3.amazonaws.com%2Fimages%2Fsome+image+#name.jpg)[function.file-get-contents]: failed to open stream: No such file or directory
If you copy/paste http://site.s3.amazonaws.com/images/some image #name.jpg into browser address bar, image will open.
What's bad and how to fix that?
Using function urlencode() for entire URL, will generate an invalid URL. Leaving the URL as it is also is not correct, because in contrast to the browsers, the file_get_contents() function don't perform URL normalization. In your example, you need to replace spaces with %20:
$string = str_replace(' ', '%20', $string);
The URL you have specified is invalid. file_get_contents expects a valid http URI (more precisely, the underlying http wrapper does). As your invalid URI is not a valid URI, file_get_contents fails.
You can fix this by turning your invalid URI into a valid URI. Information how to write a valid URI is available in RFC3986. You need to take care that all special characters are represented correctly. e.g. spaces to plus-signs, and the commercial at sign has to be URL encoded. Also superfluous whitespace at beginning and end need to be removed.
When done, the webserver will tell you that the access is forbidden. You then might need to add additional request headers via HTTP context options for the HTTP file wrapper to solve that. You find the information in the PHP manual: http:// -- https:// — Accessing HTTP(s) URLs
Related
I can't tell you how many hours of my life I've wasted on these kinds of idiotic errors.
I'm basically constructing a URL such as: https://example.com/?test=' . urlencode('meow+foo#gmail.com');
Then, I display it from the URL, like this: echo urldecode($_GET['test']);
And then it shows: meow foo#gmail.com.
Ugh.
If I instead fo this: echo $_GET['test'];
I get: meow+foo#gmail.com.
(Naturally, echoing a GET variable like that is insanity, so I would of course do htmlspecialchars around it in reality. But that's not the point I'm making here.)
So, since browsers (or something) is clearly making this "translation" or "decoding" automatically, doing it again messes it up by removing certain characters, in this case the "+" (plus). Which leads me to believe that I'm not supposed to use urldecode/rawurldecode at all.
But then why do they exist?
So when would one ever want to use them
I recently had a case where we added triggers to an S3 bucket which were being picked up by a Lambda function and sent via a HTTP request to an API endpoint.
If the path of the file on S3 was multiword, it would replace the space with a + at which point it would break our code because tecnically the path is incorrect.
Once you run it through urldecode it becomes a valid path because as per the docs:
Decodes any %## encoding in the given string. Plus symbols ('+') are decoded to a space character.
That would be a valid use case for this function as no browser is involved. Just background processes/requests.
kindly I have two links,
when using both of the links in another page, the first link is decoded automatically by GET Method and the second didn't.
the problem is that if there is a space in any attribute, the get don't decode automatically the URL and if there are no spaces, the get automatically decoding the URL which is the correct behaviour
tip : the only encoded attribute is BodyStr and encoded via URLENCODE PHP function.
another tip: the difference between both is the space in subjectStR Attribute
I want to know why spaces in URL prevent GET Global Variable from automatically decoding all the attributes
$message=urlencode($message);
http://localhost/test4.php?me=ahmed&y=1&clientid=55&default=1&Subjectstr=**Email From Contactuspage`**&BodyStr=$message
http://localhost/test4.php?me=ahmed&y=
1&clientid=55&default=1&Subjectstr=**EmailFromContactuspage**&BodyStr=$message
Space isn't allowed in URL query strings. If you put an unencoded space in SubjectStr, the URL ends at that point, so the server never sees the BodyStr parameter.
You need to URL-encode SubjectStr. Replace the spaces with + or %20.
$message=urlencode($message);
$url = "http://localhost/test4.php?me=ahmed&y=1&clientid=55&default=1&Subjectstr=Email+From+Contactuspage&BodyStr=$message"
The reason why it stops at space is because of the HTTP protocol. The client sends:
GET <url> HTTP/1.1
This request line is parsed by looking for the space between the URL and the HTTP version token. If there's a space in the URL, that will be treated as the end of the URL.
For some odd reason my if statement to check the urls using FILTER_VALIDATE_URL is returning unexpected results.
Simple stuff like https://www.google.nl/ is being blocked but www.google.nl/ isn't? Its not like it blocks every single URL with http or https infront of it either. Some are allowed and others are not, I know there are a bunch of topics for this but most of them are using regex to filter urls. Is this beter than using FILTER_VALIDATE_URL? Or Am I doing something wrong?
The code I use to check the URLS is this
if (!filter_var($linkinput, FILTER_VALIDATE_URL) === FALSE) {
//error code
}
You should filter it like this first. (Just for good measure).
$url = filter_var($url, FILTER_SANITIZE_URL);
The FILTER_VALIDATE_URL only accepts ASCII URL's (ie, need to be encoded). If the above function does not work see PHP urlencode() to encode the URL.
If THAT doesn't work, then you should manually strip the http: from the beginning like this ...
$url = strpos($url, 'http://') === 0 ? substr($url, 7) : $url;
Here are some flags that might help. If all of your URL's will have http:// you can use FILTER_FLAG_SCHEME_REQUIRED
The FILTER_VALIDATE_URL filter validates a URL.
Possible flags:
FILTER_FLAG_SCHEME_REQUIRED - URL must be RFC compliant (like http://example)
FILTER_FLAG_HOST_REQUIRED - URL must include host name (like http://www.example.com)
FILTER_FLAG_PATH_REQUIRED - URL must have a path after the domain name (like www.example.com/example1/)
FILTER_FLAG_QUERY_REQUIRED - URL must have a query string (like "example.php?name=Peter&age=37")
The default behavior of FILTER_VALIDATE_URL
Validates value as URL (according to »
http://www.faqs.org/rfcs/rfc2396), optionally with required
components.
Beware a valid URL may not specify the HTTP protocol
http:// so further validation may be required to determine the URL
uses an expected protocol, e.g. ssh:// or mailto:.
Note that the
function will only find ASCII URLs to be valid; internationalized
domain names (containing non-ASCII characters) will fail.
my Worketc account URL : akhilesh.worketc.com.
But PHP function FILTER_VALIDATE_URL gives this URL as a invalid url.
So is there any alternate way to solve this problem ?
Add Protocol to it. i.e. append http/https/ftp etc. to your url before testing.
var_dump(filter_var('http://akhilesh.worketc.com', FILTER_VALIDATE_URL));
FILTER_VALIDATE_URL
Validates value as URL (according to » http://www.faqs.org/rfcs/rfc2396), optionally with required components. Beware a valid URL may not specify the HTTP protocol http:// so further validation may be required to determine the URL uses an expected protocol, e.g. ssh:// or mailto:. Note that the function will only find ASCII URLs to be valid; internationalized domain names (containing non-ASCII characters) will fail.
Source
http://localhost/foo/profile/%26lt%3Bi%26gt%3Bmarco%26lt%3B%2Fi%26gt%3B
The url above gives me a 404 Error, the url code is this: urlencode(htmlspecialchars($foo));, as for the $foo: <i>badhtml</i>
The url works fine when there's nothing to encode e.g. marco.
Thanks. =D
Update: I'm supposed to capture the segment in the encoded part of the uri, so a 404 isn't supposed to appear.
There isn't any document there, marco is simply the string that I needed to fetch that person's info from db. If the user doesn't exist, it won't throw that ugly error anyways.
Slight idea what's wrong: I found out that if I used <i>badhtml<i>, it works just fine but <i>badhtml</i> won't, what do I do so that I can maintain the / in the <i>?
It probably think of the request as http://localhost/foo/profile/<i>badhtml<**/**i>
Since there is a slash / in the parameter, this is getting interpreted as a path name separator.
The solution, therefore, is to replace all occurrences of a slash with something that doesn't get interpreted as a separator. \u2044 or something. And when reading the parameter back in, change all \u2044s back to normal slashes.
(I chose \u2044 because this character looks remarkably like a normal slash, but you can use anthing that would never occur in the parameter, of course.)
It is most likely that the regex responsible for handling the URL rewrite does not like some of the characters in the URL-encoded string. This is most likely httpd/apache question, rather than PHP. Your best guess is to start by looking at the .htaccess (file containing URL rewrite rules).
This question assumes that your are trying to pass an argument through the URL, rather than access a file named <i>badhtml</i>.
Mr. Lister, you rocked.
"The solution, therefore, is to replace all occurrences of a slash with something that doesn't get interpreted as a separator. \u2044 or something. And when reading the parameter back in, change all \u2044s back to normal slashes."