I'm using Curl to find out the status code of a website. This is in response to a user typing in a url in a form, basically I just want to check the url is valid, so I thought the best way would be to only allow certain codes which are likely to be ok. But this isn't working as well as expected. For EG tesco returns 503, Marks&Sparks a 405. So it seems like there could be a lot more status codes which are in fact ok but which don't seem like they should be ok to me.
So... my question is, what http status codes should I trust. Or should I be doing this the other way round and pass everything except some particular status codes?
For completeness and in case it helps anyone, here's how I'm getting the status code:
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_NOBODY, true);
$result = curl_exec($curl);
$statusCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
if ($statusCode == 200 || $statusCode == 300 || $statusCode == 301 || $statusCode == 302 || $statusCode == 303 || $statusCode == 307 || $statusCode ==) {
$ret = true;
}
Check Existance of a url through CURL
Refer :http://www.php.net/manual/en/function.file-exists.php#74469
<?php
function url_exists($url) {
if (!$fp = curl_init($url)) return false;
return true;
}
?>
Some Status Code and References as per your doubt
200 OK
The request has succeeded. The information returned with the response is dependent on the method used in the request, for example:
GET an entity corresponding to the requested resource is sent in the response;
HEAD the entity-header fields corresponding to the requested resource are sent in the response without any message-body;
POST an entity describing or containing the result of the action;
TRACE an entity containing the request message as received by the end server.
201 Created
The request has been fulfilled and resulted in a new resource being created. The newly created resource can be referenced by the URI(s) returned in the entity of the response, with the most specific URI for the resource given by a Location header field.
202 Accepted
The request has been accepted for processing, but the processing has not been completed. The request might or might not eventually be acted upon, as it might be disallowed when processing actually takes place. There is no facility for re-sending a status code from an asynchronous operation such as this.
203 Non-Authoritative Information
The returned metainformation in the entity-header is not the definitive set as available from the origin server, but is gathered from a local or a third-party copy. The set presented MAY be a subset or superset of the original version. For example, including local annotation information about the resource might result in a superset of the metainformation known by the origin server. Use of this response code is not required and is only appropriate when the response would otherwise be 200 (OK).
204 No Content
The server has fulfilled the request but does not need to return an entity-body, and might want to return updated metainformation. The response MAY include new or updated metainformation in the form of entity-headers, which if present SHOULD be associated with the requested variant.
205 Reset Content
The server has fulfilled the request and the user agent SHOULD reset the document view which caused the request to be sent. This response is primarily intended to allow input for actions to take place via user input, followed by a clearing of the form in which the input is given so that the user can easily initiate another input action. The response MUST NOT include an entity.
Read This
http://www.seocentro.com/articles/apache/http-status-codes.html
http://en.wikipedia.org/wiki/List_of_HTTP_status_codes
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
Related
I'm writing some code in my localhost.
index.php:
$task = null;
$method = $_SERVER['REQUEST_METHOD'];
var_dump($method);
//initialize data
HttpProtocol::init();
if (empty($method))
exitApp("unknown method");
if ($method == HttpProtocol::get())
$task = new WebhookVerifyTask();
else if ($method == HttpProtocol::post())
$task = new ProcessFacebookEventTask();
if (is_null($task))
exitApp("unknown method");
$task->start();
http_response_code(200);
it doesn't matter if I send a GET or POST request, the $method will always be GET.
When trying PUT or DELETE - it changes perfectly..
What could cause the $method to always be GET even when POST ?
UPDATE
apparently when i'm sending the request to localhost/path - the above behaviour occur. if i'm sending it to localhost/path/ - the post works perfectly.
apparently when i'm sending the request to localhost/path - the above behaviour occur. if i'm sending it to localhost/path/ - the post works perfectly
Your update also answers your question. if it's /path, but there's no such file, the web server automatically redirects you to /path/ instead. – Janno
when it does this redirection - he does not do it completely with all the request data and methods ?
It cannot. The web server decides to tell the client that it should try another request to a different URL. The web server responds with a 302 Found status code and a Location: http://localhost/path/ header. This causes the client to make another HTTP request to that new location, and that new request will always be a GET request. POST requests cannot be redirected. (Well, theoretically they can be with a 307 Temporary Redirect, but in practice that is not widely supported.)
You need to make your request to the canonical URL directly so as to not cause a redirect.
I'm making a request to retrieve a JSON file to a server at a particular secure DocuSign uri. However, unless I put in the authorization information (which I do have), I am unable to have the file returned.
<?php
$json = file_get_contents("https://example.docusign.com/sensitiveIDs/moreID");
echo $json
?>
Where would I put in authorization information for the specific server/username/password/other info needed to access the particular DocuSign server using a method like this in PHP? Is there a better method to use for this scenario in PHP?
It depends on how the authorization is implemented. If its basic or digest HTTP authentication then specify it in the URL:
file_get_contents("https://$USER:$PASSWORD#example.docusign.com/sensitiveIDs/moreID");
Cookie based authentication is a lot more difficult (and probably easier to use Curl or even a more complex system like Guzzle. If its oauth2, then you probably want an oauth2 library.
Your call needs to include authentication to make the GET call to retrieve the file.
If your app is initiated by a human use Oauth to retrieve access and refresh tokens. Then included the access token with the GET request.
If your app is a "system app" that wants to autonomously retrieve the file, then you should authenticate by using X-DocuSign-Authentication -- include the following header in your HTTPS request. Since the request is HTTPS, the content is encrypted on the wire:
X-DocuSign-Authentication: <DocuSignCredentials><Username>{name}</Username><Password>{password}</Password><IntegratorKey>{integrator_key}</IntegratorKey></DocuSignCredentials>
Replace {name} with your email address (no braces), etc.
The bottom line is that you can't use the file_get_contents Php method. Instead, you'd do something like the following:
Use https://github.com/rmccue/Requests or a similar library to help with the https request. (http is not allowed due to security issues.)
(untested code)
$url = $base_url . $the_url_section_for_this_call
$headers = array('X-DocuSign-Authentication' =>
'<DocuSignCredentials><Username>your_name</Username><Password>your_password</Password><IntegratorKey>your_integrator_key</IntegratorKey></DocuSignCredentials>');
$request = Requests::get($url, $headers);
# Check that the call succeeded (either 200 or 201 depending on the method
$status_code = $request->status_code;
if ($status_code != 200 && $status_code != 201) {
throw new Exception('Problem while calling DocuSign');
}
$json = $request->body;
Okay, I haven't been able to find a solution to this as of yet, and I need to start asking questions on SO so I can get my reputation up and hopefully help out others.
I am making a wordpress plugin that retrieves a json list of items from a remote site. Recently, the site added a redirecting check for a cookie.
Upon first request without the cookie, 302 headers are provided, pointing to a second page which also returns a 302 redirect pointing to the homepage. On this second page, however, the set-cookie headers are also provided, which prevents the homepage from redirecting yet again.
When I make a cURL request to a url on the site, however, it fails in a redirect loop.
Now, obviously the easiest solution would be to fix this on the remote server. It should not be implementing that redirect for api routes. But that at the moment is not an option for me.
I have found how to retrieve the set-cookie header value from a 2** code response, however I cannot seem to figure out how to access that value when 302 headers are provided, and cURL returns nothing but an error.
Is there a way to access the headers even when it reaches the maximum (20) redirects?
Is it possible to stop the execution after a set number of redirects?
How can I get this cookie's value so I can provide it in a final request?
If you use the cURL option CURLOPT_HEADER the data you get back from curl_exec will include the headers from each response, including the 302.
If you enable cookie handling in cURL, it should pick up the cookie set by the 302 response just fine unless you prefer to handle it manually.
I often do something like this when there could be multiple redirects:
$ch = curl_init($some_url_that_302_redirects);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIEFILE, ''); // enable curl cookie handling
$result = curl_exec($ch);
// $result contains the headers from each response, plus the body of the last response
$info = curl_getinfo($ch); // info will tell us how many redirects were followed
for ($i = 0; $i < intval($info['redirect_count']); ++$i) {
// get headers from each response
list($headers, $response) = explode("\r\n\r\n", $response, 2);
// DO SOMETHING WITH $headers HERE
// If there was a redirect, headers will be all headers from that response,
// including Set-Cookie headers
}
list($headers, $body) = explode("\r\n\r\n", $response, 2);
// Now $headers are the headers from the final response
// $body is the content from the final response
You already had problems before you started trying to add cookies into the mix. Doing a single redirect is bad for performance. Using a 302 response as a means of dissociating data presentation from data retrieval under HTTP/1,1 or later is bad (it works, but is a violation of the protocol - you should be using a 303 if you really must redirect).
Trying to set a cookie in a 3xx response will not work consistently across browsers. Setting a cookie in an Ajax response will not work consistently across browsers.
It should not be implementing that redirect for api routes
Maybe the people at the remote site are trying to prevent you leeching their content?
Fetch the homepage first in an iframe to populate the cookie and record a flag in your domain on the browser.
I actually found another SO question, of course after I posted, that lead me in the right direction to make this possible, HERE
I used the WebGet class to make the curl request. It has not been maintained for three years, but it still works fine.
It has a function that makes the curl request without following through on the redirect loop.
There are a lot of curl options set in that function, and curl is not returning an error in it, so I'm sure the exact solution could be simpler. HERE is a list of curl options for anyone who would like to delve deeper.
Here is how I handle each of the responses to get the final response
$w = new WebGet();
$cookie_file = 'cookie.txt';
if (!file_exists($cookie_file)) {
$cookie_file_inter = fopen($cookie_file, "w");
fclose($cookie_file_inter);
}
$w->cookieFile = $cookie_file; // must exist and be writable
$w->requestContent($url);
$headers = $w->responseHeaders;
if ($w->responseStatusCode == 302 && isset($headers['LOCATION'])) {
$w->requestContent($headers['LOCATION']);
}
if ($w->responseStatusCode == 302 && isset($headers['LOCATION'])) {
$w->requestContent($headers['LOCATION']);
}
$response = $w->cachedContent;
Of course, this is all extremely bad practice, and has severe performance implications, but there may be some rare use cases that find themselves needing to do this.
I am writing a small php script that can distinguish two between different kinds of responses from a third-party website.
For the human visitor, recognizing the difference is fairly easy: Response #1 is a bare-bones 404 error page, whereas response #2 redirects to the main page.
For my script, this turns out to be somewhat more difficult. Both types return a '404' status code, file_get_contents() returns empty for both and the "redirect" doesn't really register as a redirect (like I said, there's a '404' status code, not a '30X'). Get_headers() shows no distinction, either (no "Location:" or anything of that sort).
Any way I can get this done?
There are many ways to do a redirect:
the HTTP response codes for redirect (usually 301 and 302) accompanied by the Location: header that contains the URL
HTTP/1.1 302 Found
Location: http://www.example.org
the HTTP header Refresh:; it contains a number of seconds to wait and the new URL:
Refresh: 0; url=http://www.example.org
the HTML meta element that emulates the Refresh HTTP header:
<meta http-equiv="Refresh" content="0; url=http://www.example.org">
Javascript:
<script>document.location = 'http://www.example.org';</script>
Note that there are countless possibilities to redirect using Javascript. What is common to all of them is the usage of the location property of document or window. location is an object of type Location that can be assigned directly using a string or can be changed using its href property or its methods assign() and replace().
If the requests to your URLs does not return any content, the both return status code 404 and no Location: header then check for the presence of the Refresh: header in the response.
You better use curl to make the requests instead of file_get_contents(). curl provides a better control of the headers sent and received.
I would suggest You to make a cURL request to desired site and see what kind of response it is.
As described in PHP manual,
curl_getinfo ($handle, CURLINFO_HTTP_CODE);
will give You an associative array, in it You will find the http_code key, which will hold Your status code.
If redirect case will give You non-30X status code, You can try to fetch redirect url by this:
curl_getinfo ($handle, CURLINFO_REDIRECT_URL);
I'm currently in the process of creating a small API. I have some error conditions, the 3 in question in this case are:
The user making a request with any method other than POST
The user not being authenticated
An entity not being found; resulting in no action being able to be made.
In that order. I had originally decided that I could assign a status code to each of these errors, (i.e. 400, 403, and 404, in that order) but then realised that I can't set multiple HTTP status codes.
How does one deal with this issue? Should I use HTTP status codes?
In my view it should check each of these conditions in the order you specified and return immediately with the corresponding error code if one of the conditions fail.
So only 1 error code will be returned.
It would be OK to use HTTP status codes, but it depends on who is consuming your API. Sometimes it is better to just return 200 OK and then include Error information in the body.
With Status Codes
If you go with status codes just return the first error encountered, no use in handling the request further anyways, so in pseudo:
if (request is not POST) return 405; //abort here
//we know request is POST here
if (request not auhtorized) return 401; //abort here
//we know request is POST and authorized
if (request requests a not exisiting entity) return [404, 422, ..., 5xx] either will do; // abort here
// we now know the request is POST, autorized and requests valid information
processRequest();
Without Status Codes
As an alternative, since you tagged ajax, I assume you are returning JSON, so just return 200 OK and include a the fields success : [true|false] and errorMessage : ["Not POST"|"Bad Auth"|"Bad Request or Unknown resource"|"OK"] in your JSON answer.
You could also combine both ways, but depending on the ajax client not all will work well with all status codes. Given the information in the answer, all you need to do is check if success === true and handle error otherwise.