I'd like to download a remote page only when it differs from a version I have already. There's no "Last-Modified" or "Expires" (the server sends Cache-Control: max-age=0, private, must-revalidate) but there's the ETag: field.
So, I can send If-None-Match: header with last ETag value and on any error (including 304 Not Modified) retry after a delay.
Currently I'm using simplexml_load_file to grab the URL, and I wonder if I can just call it in some way adding the extra header, or do I need to roll out more heavyweight solutions (curl, file_get_contents etc)?
You can use cURL with adding custom header, then use simplexml_load_string (with return content from cURL request) to get SimpleXMLElementobject.
curl_setopt($ch, CURLOPT_HTTPHEADER, array('If-None-Match:: XXX'));
Related
I have a symfony2 application where I am using the Guzzle http client to send a GET request to a server in order to retrieve the contents of a json file. The Guzzle response gets transformed into a Symfony2 response to the browser.
The Guzzle response comes back with the following headers:
Content-Encoding: gzip
Content-Length: 2255
Content-Type: application/json
When outputting the data to the UI/browser I notice that it gets cut off because the Content-Length is incorrect. The size of the file is closer to 4905 bytes, not 2255. 2255 is the exact length of the data up to the cut-off point. I suspect that the 2255 is the size of the gzipped data and it gets uncompressed at some point without updating the content-length. Now I did verify that I get all of the data back, however the content-length header is honored which is why the data gets cut off when I forward it to the browser. Interestingly, hitting the url to the json file directly yields the full contents even though the content-length is 2255 which means it gets ignored by Chrome when hitting the file directly. Same if I use the POSTman REST client to make the GET request - full contents get displayed.
By default, Guzzle has a request option decode_content = true for how the responses should be handled. I set it to false when submitting the request but that didn't seem to resolve the issue.
Before converting the Guzzle response to a Symfony response I removed the content-length header and that seems to solve the problem however I am not sure that's the best approach since RFC protocol states that a content-length header should be present unless a transfer-encoding header is present, which it isn't. https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html
Another alternative is, since this is a streamed response, to get the size of the stream and correct the content-length, however the Guzzle implementation uses strlen() for this which has the undesirable affect of reading the whole stream.
What possible issues might I run into if I choose to omit the content-length header? And alternatively, is there a way to get the TRUE length of the contents without reading the whole stream and simply update the content-length header with the correct amount?
I'm using Laravel 3 and it's not obvious how to set headers in any way other than through Response::make().
I am doing a redirect like this:
return Redirect::to('admin/check');
I'd like to set an additional no-cache header for the redirect like so:
"Cache-Control: no-store, no-cache, must-revalidate"
I realize I could just do this directly in PHP, but is there any way to set response headers via Laravel?
When you call Redirect::to() Laravel instantiates a Response object with 302 status and a Location header. That Response object is then returned by the controller and rendered as a proper HTTP response, so, at controller time, you can still change its headers.
To be even more precise class Redirect extends Response. Take a look here
You can achieve that by simply using:
return Redirect::to('admin/check')
->header('Cache-Control', 'no-store, no-cache, must-revalidate');
I'm afraid, the accepted answer is wrong and misleading!
It's impossible to redirect to a page with custom headers set, no matter what language or framework you use. In other words, there's no way to trigger an HTTP redirect and cause the client (browser) to add a custom header.
You might be thinking that this code should work just fine:
return Redirect::to('admin/check')
->header('Cache-Control', 'no-store, no-cache, must-revalidate');
But it won't. You're setting the custom headers for the response which is instructing the browser to redirect, not for the redirect itself.
The only way for a site to instruct a browser to issue an HTTP request with a custom header is to use Javascript and the XMLHttpRequest object. And it needs CORS implemented on the target server to allow such ajax requests.
Please note that a page can not set HTTP request headers unless it's making an async request using XMLHttpRequest. Meaning that you can't do such redirection with the custom header on the client-side as well.
That's not how the web works.
How to check whether a page has been updated without downloading the entire webpage in Php? Whether I need to look in at the header?
One possibility is to check the LastModified header. You can download just the headers by issuing a HEAD request. The server responds back with the HTTP headers only and you can inspect the last modified header and/or the content length header to detect changes.
Last-modified "Mon, 03 Jan 2011 13:02:54 GMT"
One thing to note is that the HTTP server does not need to send this header so this would not work in all cases. The PHP function get_headers will fetch these for you.
// By default get_headers uses a GET request to fetch the headers. If you
// want to send a HEAD request instead, you can do so using a stream context:
stream_context_set_default(
array(
'http' => array(
'method' => 'HEAD'
)
)
);
$headers = get_headers('http://example.com');
You can add a If-Modified-Since: <datetime> header to your request, and the server should return a 304 Not Modified if it hasn't changed since then. But if the document is generated dynamically (php, perl, etc.), the generator could be too lazy to check this header and always return the full document.
I'm using curl to fetch a webpage, I need to detect if the response is gzip or not.
This works perfectly fine if Content-Encoding is specified in the response headers, but some servers instead return "Transfer-Encoding": "Chunked" and no Content-Encoding header.
Is there any way to detect gzip or get the raw (encoded) server response?
I tried looking at curl_getinfo but the content_encoding isn't specified either.
Thanks.
You can check if response starts with gzip magic numbers, specifically 1f 8b.
Is there any way to detect gzip
Yes. You can use cURLs Header functions. For example you can define an function, which handles the header responses. Use curl_setopt()with the CURLOPT_HEADERFUNCTION option. Or write it to an file (which you have created with fopen()) with the CURLOPT_WRITEHEADER option.
There may are more options you could use. Look out the possibilities at the curl_setopt() manual. The header you are looking for have the name: Content-Encoding.
If you have the output in a file, you could also use PHPs finfo with some of its predefined constants. Or mime_content_type() (DEPRECATED!) if finfo is not available to you.
[...] or get the raw (encoded) server response?
Yes. You can specify the accept-encoding header. The value you are look for is identity.
So you can send:
Accept-Encoding: identity
May have look to the HTTP/1.1 RFC
To get an unencoded/uncompressed output (for example to directly write it into a file).
Use CURLOPT_ENCODING for this purpose. You can set it also with *curl_setopt*.
You can either issue a separate HEAD request:
CURLOPT_HEADER => true
CURLOPT_NOBODY => true
Or request the header to be prefixed to your original request:
CURLOPT_HEADER => true
But, if you just want to get the (decoded) HTML, you can use:
CURLOPT_ENCODING => ''
And CURL will automatically negotiate with the server and decode it for you.
Attempting to submit a form with CURL, both via PHP and the command line. The response from the server consists of null content (the headers posted below).
When the same URL is submitted via a browser, the response consists of a proper webapge.
Have tried submitting the CURL request parameters via POST and GET via each of the following command line curl flags "-d" "-F" and "-G".
If the query string parameters are posted with "-d" flag, resulting header is:
HTTP/1.1 302 Moved Temporarily
Date: Thu, 02 Jun 2011 21:41:54 GMT
Server: Apache
Set-Cookie: JSESSIONID=DC5F435A96A353289F58593D54B89570; Path=/XXXXXXX
P3P: CP="CAO PSA OUR"
Location: http://www.XXXXXXXX.com/
Content-Length: 0
Connection: close
Content-Type: text/html;charset=UTF-8
Set-Cookie: XXXXXXXXXXXXXXXX=1318103232.20480.0000; path=/
If the query string parameters are posted with "-F" flag, the resulting header is:
HTTP/1.1 100 Continue
HTTP/1.1 500 Internal Server Error
Date: Thu, 02 Jun 2011 21:52:54 GMT
Server: Apache
Content-Length: 1677
Connection: close
Content-Type: text/html;charset=utf-8
Set-Cookie: XXXXXXXXXXXXXX=1318103232.20480.0000; path=/
Vary: Accept-Encoding
<html><head><title>Apache Tomcat/5.5.26 - Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 500 - </h1><HR size="1" noshade="noshade"><p><b>type</b> Exception report</p><p><b>message</b> <u></u></p><p><b>description</b> <u>The server encountered an internal error () that prevented it from fulfilling this request.</u></p><p><b>exception</b> <pre>javax.servlet.ServletException: Servlet execution threw an exception<br>
</pre></p><p><b>root cause</b> <pre>java.lang.NoClassDefFoundError: com/oreilly/servlet/multipart/MultipartParser<br>
com.corsis.tuesday.servlet.mp.MPRequest.<init>(MPRequest.java:27)<br>
com.corsis.tuesday.servlet.mp.MPRequest.<init>(MPRequest.java:21)<br>
com.corsis.tuesday.servlet.TuesdayServlet.doPost(TuesdayServlet.java:494)<br>
javax.servlet.http.HttpServlet.service(HttpServlet.java:710)<br>
javax.servlet.http.HttpServlet.service(HttpServlet.java:803)<br>
</pre></p><p><b>note</b> <u>The full stack trace of the root cause is available in the Apache Tomcat/5.5.26 logs.</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/5.5.26</h3></body></html>
Questions:
What might cause a server to respond different depending on the nature of the CURL request.
How to successfully submit request via CURL?
HTTP/1.1 100 Continue
I had problems associated with this header before. Some servers simply do not understand it. Try this option to override Expect header.
curl_setopt( $curl_handle, CURLOPT_HTTPHEADER, array( 'Expect:' ) );
To add to what Richard said, I have seen cases where servers check the User-Agent string and behave differently based on its value.
I have just had an experience with this and what fixed it was surprising. In my situation I was logging into a server so I could upload a file, have the server do work on it, and then download the new file. I did this in Chrome first and used the dev tools to capture over 100 HTTP requests in this simple transaction. Most are simply grabbing resources I don't need if I am trying to do all of this from the command line, so I filtered out only the ones I knew at a minimum I should need.
Initially this boiled down to a GET to set the cookie and log in with a username and password, a POST to upload the file, a POST to execute the work on the file, and a GET to retrieve the new file. I could not get the first POST to actually work though. The response from that POST is supposed to be information containing the upload ID, time uploaded, etc, but instead I was getting empty JSON lists even though the status was 200 OK.
I used CURL to spoof the requests from the browser exactly (copying the User-Agent, overriding Expect, etc) and was still getting nothing. Then I started arbitrarily adding in some of the requests that I captured from Chrome between the first GET and POST, and low and behold after adding in a GET request for the JSON history before the POST the POST actually returned what it was supposed to.
TL;DR Some websites require more requests after the initial log in before you can POST. I would try to capture a successful exchange between the server and browser and look at all of the requests. Some requests might not be as superfluous as the seem.