I am using curl to retrieve content from a website, but the http code retrieved by curl is 200 and the content is empty. When I used this on firefox, I see a 302 redirection. I have already addes this line:
curl_setopt($http, CURLOPT_FOLLOWLOCATION, true);
When I used the command line, I get the same result:
curl -I -L http://www.caudalie.fr
In Firefox, the final location will be http://fr.caudalie.com/ but curl never gets this.
Have you an idea?
I tried some different request headers, starting from the headers Firefox sent. The minimum doesnt work:
bf#desktop-bf:~$ telnet www.caudalie.fr 80
Trying 178.16.174.50...
Connected to www.caudalie.com.
Escape character is '^]'.
GET / HTTP/1.1
Host: www.caudalie.fr
Connection: keep-alive
HTTP/1.1 200 OK
Server: nginx
Date: Fri, 26 Apr 2013 07:23:36 GMT
Content-Type: text/html
Connection: keep-alive
Expires: Sat, 26 Jul 1997 05:00:00 GMT
Vary: Accept-Encoding
Content-Length: 0
I got a redirect if I give a language:
bf#desktop-bf:~$ telnet www.caudalie.fr 80
Trying 178.16.174.50...
Connected to www.caudalie.com.
Escape character is '^]'.
GET / HTTP/1.1
Host: www.caudalie.fr
Accept-Language: nl,en;q=0.7,en-us;q=0.3
HTTP/1.1 302 Found
Server: nginx
Date: Fri, 26 Apr 2013 07:23:55 GMT
Content-Type: text/html
Connection: keep-alive
Expires: Sat, 26 Jul 1997 05:00:00 GMT
Location: http://fr.caudalie.com/
Vary: Accept-Encoding
Content-Length: 0
So, add a Accept-Language header and you should be OK. In PHP, that would be:
curl_setopt($http,CURLOPT_HTTPHEADER,array('Accept-Language: nl,en;q=0.7;en-us;q=0.3'));
See also here: How to send a header using a HTTP request through a curl call?
Related
I am trying to push a page to the browser while it is being generated from a PHP script. I have no access to my hosting provider's nginx configuration but they have told me that they use nginx 1.8.1. In my phpinfo() output I can see
output_buffering 0 0
and the same script works as expected on my local PC.
This is my starting script:
<pre>
<?php
for ($i = 0; $i < 100; ++$i) {
print('<b>.</b>');
flush();
usleep(100000); // 0.1 second
}
?>
</pre>
I start getting output immediately on my local PC but I have to wait the full 10 seconds before I see anything when the page is accessed from my hosting.
These are the default response headers:
HTTP/1.1 200 OK
Server: nginx
Date: Tue, 12 Apr 2016 12:32:05 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Vary: Accept-Encoding
X-Powered-By: PHP/5.4.45
Content-Encoding: gzip
If I add
<?php
header('X-Accel-Buffering: no');
I get
HTTP/1.1 200 OK
Server: nginx
Date: Tue, 12 Apr 2016 12:35:10 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Vary: Accept-Encoding
X-Powered-By: PHP/5.4.45
Content-Encoding: gzip
<?php
header('X-Accel-Buffering: no');
header('Content-Encoding: identity');
HTTP/1.1 200 OK
Server: nginx
Date: Tue, 12 Apr 2016 12:37:11 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 812
Connection: keep-alive
X-Powered-By: PHP/5.4.45
Content-Encoding: identity
Obviously, if the server knows the length of the content, it has waited for the script to finish before starting to send it to the browser.
These are the headers on my local machine:
HTTP/1.1 200 OK
Date: Tue, 12 Apr 2016 12:52:31 GMT
Server: Apache/2.4.7 (Win32) PHP/5.4.45 OpenSSL/1.0.1e
X-Powered-By: PHP/5.4.45
X-Accel-Buffering: no
Content-Encoding: identity
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=utf-8
The X-Accel-Buffering header gets passed through because I am not running nginx locally.
Are there any other headers I can pass through from PHP to stop nginx from buffering the content? So far I've only found config options, which I don't have access to.
I need to get last HTTP headers. My string is:
HTTP/1.1 302 Moved Temporarily
Date: Sat, 30 Apr 2016 09:48:56 GMT
Server: Apache
X-Powered-By: PHP/5.5.34
Location: 2.php
Content-Length: 0
Content-Type: text/html
HTTP/1.1 302 Moved Temporarily
Date: Sat, 30 Apr 2016 09:48:57 GMT
Server: Apache
X-Powered-By: PHP/5.5.34
Location: 3.php
Content-Length: 0
Content-Type: text/html
HTTP/1.1 200 OK
Date: Sat, 30 Apr 2016 09:48:57 GMT
Server: Apache
X-Powered-By: PHP/5.5.34
Transfer-Encoding: chunked
Content-Type: text/html
But I need to get last headers. I tried to explode this string with \n\n but I couldn't get the result. Is it possible to do it with preg_match?
Gotcha!
I need to explode it with this code:
explode("\r\n\r\n", $header);
The solution using preg_split and array_pop functions:
// $headers is your initial string
$headers_splitted = preg_split("/\R{2}/", $headers);
print_r(array_pop($headers_splitted));
The output:
HTTP/1.1 200 OK
Date: Sat, 30 Apr 2016 09:48:57 GMT
Server: Apache
X-Powered-By: PHP/5.5.34
Transfer-Encoding: chunked
Content-Type: text/html
I am using PHP and cURL to send in HTTP request to a REST API. However, I need to validate some data going in/out.
How can I get a copy of the entire request logged into a file?
I have added this line to my request
curl_setopt($ch, CURLOPT_VERBOSE, true);
I am hoping I can get a copy to look like this saves to a log file.
--- START REQUEST
POST /icws/3048186002/interactions/3002853105 HTTP/1.1
Host: servername:8018
Accept: */*
Cookie: icws_3048186002=64425290-29e5-432a-9ee1-c303cdee1f79
ININ-ICWS-CSRF-Token: WAp3cmF0Y2xpZmZlWBJJQ1dTLUFQSS1jb25uZWN0b3JYJGQwYjhlMWRhLTZlOTktNGIwMC05NGNlLTY5MDdkZjUwOWI5Y1gJMTAuMC40LjE4
ININ-ICWS-Session-ID: 3048186002
Content-Type: application/json
Content-Length: 35
{"attributes":{"logged":1}}
--- END REQUEST
--- START RESPOND
HTTP/1.1 200 OK
Cache-Control: no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: 0
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
Content-Type: application/vnd.inin.icws+JSON; charset=utf-8
Date: Wed, 02 Sep 2015 20:39:26 GMT
Server: HttpPluginHost
Content-Length: 0
--- END RESPOND
How can I capture this data without having to use WireShark? I would think cURL and PHP are advanced enough to allow you to log such a thing.
I'm working on crawling information from a website: http://www.fatwallet.com
There are many redirected URLs. For instance: http://www.fatwallet.com/ticket/store/A4C?s=storepage
is redirected to http://www.a4c.com/?siteID=.7WaaTN6umc-s1Ih0x_Q67n6r7gInoh6Ug
I would like to use PHP to find out the redirected URL.
I have used "curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true)". I know it will automatically redirect 5 times.
However, the problem is, the page i get is not the final page, instead it's a page in between.
curl_exec returns:
HTTP/1.1 302 Moved Temporarily Server: Apache Location:
www。fatwallet。com/interstitial/signin Vary: Accept-Encoding
Content-Encoding: gzip Content-Length: 20 Content-Type: text/html
Date: Mon, 13 Apr 2015 12:03:19 GMT Connection: keep-alive
Set-Cookie: JSESSIONID=A9E28337052B56ADAC8451854A276210; Path=/;
HttpOnly
HTTP/1.1 302 Moved Temporarily Server: Apache Location:
www。fatwallet。com/interstitial/signin Vary: Accept-Encoding
Content-Encoding: gzip Content-Length: 20 Content-Type: text/html
Date: Mon, 13 Apr 2015 12:03:19 GMT Connection: keep-alive
HTTP/1.1 200 OK Server: Apache Cache-Control:
no-cache,no-store,max-age=0 Expires: Wed, 31 Dec 1969 23:59:59 GMT
X-UA-Compatible: IE=edge,chrome=1 Vary: User-Agent,Accept-Encoding
Content-Language: en Content-Encoding: gzip Content-Type:
text/html;charset=UTF-8 Content-Length: 16949 Date: Mon, 13 Apr
2015 12:03:20 GMT Connection: keep-alive Set-Cookie:
list_styles=grid; Expires=Sat, 01-May-2083 15:17:27 GMT; Path=/
Set-Cookie: non_mem=f86c0692-826f-40f2-9fa1-1e2f9a957af8; Expires=Sat,
01-May-2083 15:17:27 GMT; Path=/ ............
It seems that the third redirected code is "HTTP/1.1 200 OK", but it is not the final page. If you check http://www.fatwallet.com/ticket/store/A4C?s=storepage you will understand what I mean. Also, there is no way to find the final URL in the page returned.
So my question is, could it be able to make curl continue redirecting even if it receives HTTP/1.1 200 OK?
Is there another way to solve this(by using snoopy or python)?
Thanks for all!
Seems that last redirect is done via JS, not the native HTTP answer. You just need more advanced crawler with function to execute JS code.
Just see the source code of the first redirected page (view-source:https://www.fatwallet.com/interstitial/signin) and you will find the last one in some HTML elements, it seems that some JS code is reading those values and doing the last redirect
I am setting this with htaccess. I know it's being set properly because if I set another header:
Header set Access-Control-Allow-Origin2: *
Then chrome does see this. As soon as I remove the 2 however, chrome just completely ignores it. If I make my file a PHP file and put this in it:
<?php header("Access-Control-Allow-Origin: *"); ?>
Then it works.
Here are the response headers as reported by Chrome of the .htaccess method which I need to work and which does not:
HTTP/1.1 304 Not Modified
Date: Sun, 30 Mar 2014 00:13:06 GMT
Server: Apache/2.2.22 (Ubuntu)
Connection: Keep-Alive
Keep-Alive: timeout=5, max=100
ETag: "208f3-178a2-4f5c4f119cd34"
Vary: Accept-Encoding
Here are the response headers as reported by Chrome from the PHP method which for some reason does work:
HTTP/1.1 200 OK
Date: Sun, 30 Mar 2014 00:13:09 GMT
Server: Apache/2.2.22 (Ubuntu)
X-Powered-By: PHP/5.3.10-1ubuntu3.10
Access-Control-Allow-Origin: *
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 23
Keep-Alive: timeout=5, max=99
Connection: Keep-Alive
Content-Type: text/html
Again, I know the htaccess is setting the header, even if I go to an online service that checks reponse headers, I see this back:
HTTP/1.1 200 OK
Date: Sun, 30 Mar 2014 00:18:14 GMT
Server: Apache/2.2.22 (Ubuntu)
Last-Modified: Sat, 29 Mar 2014 20:48:34 GMT
ETag: "208f3-178a2-4f5c4f119cd34"
Accept-Ranges: bytes
Vary: Accept-Encoding
Content-Encoding: gzip
Access-Control-Allow-Origin: *
Content-Length: 33393
Content-Type: application/javascript