I'm trying to connect to a website using curl. On my local machine i can connect, but on the development server it doesn't work.
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true );
curl_setopt( $ch, CURLOPT_TIMEOUT, 3 );
curl_setopt( $ch, CURLOPT_VERBOSE, true );
curl_setopt( $ch, CURLOPT_USE_SSL, true );
curl_setopt( $ch, CURLOPT_FRESH_CONNECT, true );
On my local machine curl is configured to use OpenSSL and on the development machine curl is using NSS.
Here's the output i get on the development machine
* About to connect() to www.zomato.com port 443 (#0)
* Trying 104.81.108.141...
* Connected to www.zomato.com (104.81.108.141) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
* Server certificate:
* subject: CN=*.zomato.com,OU=Engineering,O=Zomato Media Private Limited,L=New Delhi,ST=Delhi,C=IN
* start date: May 04 00:00:00 2017 GMT
* expire date: Aug 03 23:59:59 2018 GMT
* common name: *.zomato.com
* issuer: CN=GeoTrust SSL CA - G3,O=GeoTrust Inc.,C=US
> GET /sk/slovakia HTTP/1.1
Host: www.zomato.com
Accept-Language: en,ro;q=0.8
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
Accept-Encoding: gzip, deflate, br
* Operation timed out after 3000 milliseconds with 0 out of -1 bytes received
* Closing connection 0
And here's the output i get on the local machine
* Trying 104.84.165.29...
* TCP_NODELAY set
* Connected to www.zomato.com (104.84.165.29) port 443 (#0)
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:#STRENGTH
* successfully set certificate verify locations:
* CAfile: G:\cacert.pem
CApath: none
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
* subject: C=IN; ST=Delhi; L=New Delhi; O=Zomato Media Private Limited; OU=Engineering; CN=*.zomato.com
* start date: May 4 00:00:00 2017 GMT
* expire date: Aug 3 23:59:59 2018 GMT
* subjectAltName: host "www.zomato.com" matched cert's "*.zomato.com"
* issuer: C=US; O=GeoTrust Inc.; CN=GeoTrust SSL CA - G3
* SSL certificate verify ok.
> GET /sk/slovakia HTTP/1.1
Host: www.zomato.com
Accept-Language: en,ro;q=0.8
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
Accept-Encoding: gzip, deflate, br
< HTTP/1.1 200 OK
Related
I'm trying to make a request to payment processing page. This requires authorization, which takes place through a set of redirects. In the second step, I get "411 Length Required" error, which means that content-length was lost along the way. Indeed, I cannot see it in the log. What can be done here? Change tool (programming language)?
CURLOPT_VERBOSE:
* Trying xxx.xxx.xxx.xxx...
* TCP_NODELAY set
* Connected to api.dev.example.com (188.186.236.44) port 443 (#0)
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
* subject: OU=Domain Control Validated; OU=PositiveSSL Wildcard; CN=*.dev.example.com
* start date: Apr 27 00:00:00 2019 GMT
* expire date: Apr 26 23:59:59 2021 GMT
* subjectAltName: host "api.dev.example.com" matched cert's "*.dev.example.com"
* issuer: C=GB; ST=Greater Manchester; L=Salford; O=Sectigo Limited; CN=Sectigo RSA Domain Validation Secure Server CA
* SSL certificate verify ok.
> POST /p2p/v2/payer HTTP/1.1
Host: api.dev.example.com
Content-Type: application/x-www-form-urlencoded
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Content-Length: 224
* upload completely sent off: 224 out of 224 bytes
< HTTP/1.1 302 Found
< Server: nginx
< Date: Mon, 13 Jul 2020 14:22:54 GMT
< Content-Type: text/html; charset=utf-8
< Content-Length: 213
< Connection: keep-alive
< Keep-Alive: timeout=20
< Cache-Control: private
< Location: /api/payer/auth?sessionToken=e744a95992fa405ba10662bbc6908d6bedd48a73cc0d45d589f4ef2f7d7a0b88
< Set-Cookie: returnUrl=http://example.com/returnurl.php; path=/
<
* Ignoring the response-body
* Connection #0 to host api.dev.walletone.com left intact
* Issue another request to this URL: 'https://api.dev.example.com/auth?sessionToken=e744b95992fa405ba10662bbc6908d6b7dd48a73cc0d45d589f4ef2f7d7a0b88'
* Switch from POST to GET
* Found bundle for host api.dev.example.com: 0x5649fd243480 [can pipeline]
* Re-using existing connection! (#0) with host api.dev.example.com
* Connected to api.dev.example.com (188.186.236.44) port 443 (#0)
> POST /auth?sessionToken=e744b95992fa405ba10662bbc6908d6b7dd48a73cc0d45d589f4ef2f7d7a0b88 HTTP/1.1
Host: api.dev.example.com
Content-Type: application/x-www-form-urlencoded
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
< HTTP/1.1 411 Length Required
< Server: nginx
< Date: Mon, 13 Jul 2020 14:22:54 GMT
< Content-Type: text/html; charset=us-ascii
< Content-Length: 344
< Connection: keep-alive
< Keep-Alive: timeout=20
<
* Connection #0 to host api.dev.example.com left intact
My code is:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $path);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, Array (
"Content-Type: application/x-www-form-urlencoded",
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"
));
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, $curl_method);
curl_setopt($ch, CURLOPT_POSTFIELDS, $order_data);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_STDERR, $verbose);
$response = curl_exec($ch);
curl_close($ch);
Set the content-length in the header, which would be set to the string length strlen() of $order_data
curl_setopt($ch, CURLOPT_HTTPHEADER, Array (
"Content-Type: application/x-www-form-urlencoded",
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Content-Length: ". strlen($order_data)
));
you can also debug this by checking out curl_setopt($ch, CURLINFO_HEADER_OUT, true); which makes curl_getinfo() include the request's headers in its output.
The problem was in using curl_setopt($ch, CURLOPT_CUSTOMREQUEST, $curl_method). Curl tryed to swithc to GET, like mostly browsers do, but cannot. Use curl_setopt($ch, CURLOPT_POST, 1); indeed.
Please check the code below. I am trying to scrape website using proxy and it's working now. The problem is in print_r data displaying in non-readable format. I need to make it "normal" html source code. How can I do it?
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://www.amazon.com');
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1);
curl_setopt($ch, CURLOPT_PROXY, '142.234.203.59:12345');
curl_setopt($ch, CURLOPT_PROXYUSERPWD, 'haris20202:veryfastplease123');
$data = curl_exec($ch);
curl_close($ch);
print_r($data);
Using a slightly more fully featured curl function that the one above the response looks good BUT it includes a Robot Check
* Rebuilt URL to: https://www.amazon.com/
* Trying 142.234.203.59...
* TCP_NODELAY set
* Connected to 142.234.203.59 (142.234.203.59) port 12345 (#0)
* allocate connect buffer!
* Establish HTTP proxy tunnel to www.amazon.com:443
* Proxy auth using Basic with user 'haris20202'
> CONNECT www.amazon.com:443 HTTP/1.1
Host: www.amazon.com:443
Proxy-Authorization: Basic aGFyaXMyMDIwMjp2ZXJ5ZmFzdHBsZWFzZTEyMw==
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36
Proxy-Connection: Keep-Alive
< HTTP/1.1 200 Connection established
<
* Proxy replied 200 to CONNECT request
* CONNECT phase completed!
* ALPN, offering http/1.1
* successfully set certificate verify locations:
CAfile: c:/wwwroot/cacert.pem
CApath: none
* CONNECT phase completed!
* CONNECT phase completed!
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
* subject: C=US; ST=Washington; L=Seattle; O=Amazon.com, Inc.; CN=www.amazon.com
* start date: Sep 18 00:00:00 2019 GMT
* expire date: Aug 23 12:00:00 2020 GMT
* subjectAltName: host "www.amazon.com" matched cert's "www.amazon.com"
* issuer: C=US; O=DigiCert Inc; CN=DigiCert Global CA G2
* SSL certificate verify ok.
> GET / HTTP/1.1
Host: www.amazon.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36
Accept: */*
Accept-Encoding: deflate, gzip
< HTTP/1.1 200 OK
< Content-Type: text/html
< Content-Length: 2097
< Connection: keep-alive
< Server: Server
< Date: Tue, 26 Nov 2019 10:14:10 GMT
< Vary: Content-Type,Cookie,Referer,Accept-Encoding,X-Amzn-CDN-Cache,X-Amzn-AX-Treatment,User-Agent
< Content-Encoding: gzip
< x-amz-rid: DTAY61T1CN3HGSADJG16
< Edge-Control: no-store
< X-Cache: Miss from cloudfront
< Via: 1.1 274469ea4a9ada6e05630e17982ca5de.cloudfront.net (CloudFront)
< X-Amz-Cf-Pop: PHL50
< X-Amz-Cf-Id: R3hAZb_0qdQYB25p3WwZ5D-wK_1ujzleVSOS7EZo_zsTyMx9oYU6CA==
<
* Connection #0 to host 142.234.203.59 left intact
Amazon have an API - have you considered using that? Amazon for Developers
Include header('Content-Type: application/json'); in your file to get response in string type
I make a curl request to the address https://trimet.ru/contacts/ and get:
301 Moved Permanently Location: http://trimet.ru/contacts
I change url to http://trimet.ru/contacts and get:
302 Found
When I try to add curl params:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch, CURLOPT_MAXREDIRS,100);
curl_setopt($ch, CURLOPT_AUTOREFERER,1);
I get empty result. (safe_mode = off, open_basedir none).
my source code:
$ch = curl_init();
curl_setopt( $ch, CURLOPT_URL, $url );
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER,1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT,10);
curl_setopt($ch, CURLOPT_TIMEOUT,60);
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1');
curl_setopt($ch, CURLOPT_VERBOSE,2);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch, CURLOPT_MAXREDIRS,100);
curl_setopt($ch, CURLOPT_AUTOREFERER,1);
$result = curl_exec($ch);
curl debug:
* About to connect() to trimet.ru port 80 (#0)
* Trying 2a03:6f00:1::5c35:6090... * connected
* Connected to trimet.ru (2a03:6f00:1::5c35:6090) port 80 (#0)
> GET /contacts HTTP/1.1
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.1) Gecko/20090716 Ubuntu/9.04 (jaunty) Shiretoko/3.5.1
Host: trimet.ru
Accept: */*
< HTTP/1.1 302 Moved Temporarily
< Server: nginx/1.6.3
< Date: Wed, 23 Dec 2015 20:09:32 GMT
< Content-Type: text/html
< Content-Length: 160
< Connection: keep-alive
< Location: https://trimet.ru/contacts
<
* Ignoring the response-body
* Connection #0 to host trimet.ru left intact
* Issue another request to this URL: 'https://trimet.ru/contacts'
* About to connect() to trimet.ru port 443 (#1)
* Trying 2a03:6f00:1::5c35:6090... * connected
* Connected to trimet.ru (2a03:6f00:1::5c35:6090) port 443 (#1)
* successfully set certificate verify locations:
* CAfile: none
CApath: /etc/ssl/certs
* SSL connection using AES128-SHA
* Server certificate:
* subject: C=RU; ST=Saint-Petersburg; L=Saint Petersburg; O=TimeWeb Company Limited; CN=*.timeweb.ru
* start date: 2014-11-28 00:00:00 GMT
* expire date: 2016-01-27 23:59:59 GMT
* issuer: C=US; O=thawte, Inc.; CN=thawte SSL CA - G2
* SSL certificate verify ok.
> GET /contacts HTTP/1.1
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.1) Gecko/20090716 Ubuntu/9.04 (jaunty) Shiretoko/3.5.1
Host: trimet.ru
Accept: */*
Referer: http://trimet.ru/contacts
< HTTP/1.1 301 Moved Permanently
< Server: nginx/1.6.3
< Date: Wed, 23 Dec 2015 20:09:32 GMT
< Content-Type: text/html
< Content-Length: 184
< Connection: keep-alive
< Location: http://trimet.ru/contacts
<
* Ignoring the response-body
* Connection #1 to host trimet.ru left intact
* Issue another request to this URL: 'http://trimet.ru/contacts'
* Re-using existing connection! (#0) with host trimet.ru
* Connected to trimet.ru (2a03:6f00:1::5c35:6090) port 80 (#0)
> GET /contacts HTTP/1.1
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.1) Gecko/20090716 Ubuntu/9.04 (jaunty) Shiretoko/3.5.1
Host: trimet.ru
Accept: */*
Referer: https://trimet.ru/contacts
< HTTP/1.1 302 Moved Temporarily
< Server: nginx/1.6.3
< Date: Wed, 23 Dec 2015 20:09:32 GMT
< Content-Type: text/html
< Content-Length: 160
< Connection: keep-alive
< Location: https://trimet.ru/contacts
<
* Ignoring the response-body
* Connection #0 to host trimet.ru left intact
* Issue another request to this URL: 'https://trimet.ru/contacts'
* Re-using existing connection! (#1) with host trimet.ru
* Connected to trimet.ru (2a03:6f00:1::5c35:6090) port 443 (#1)
> GET /contacts HTTP/1.1
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.1) Gecko/20090716 Ubuntu/9.04 (jaunty) Shiretoko/3.5.1
Host: trimet.ru
Accept: */*
Referer: http://trimet.ru/contacts
< HTTP/1.1 301 Moved Permanently
< Server: nginx/1.6.3
< Date: Wed, 23 Dec 2015 20:09:33 GMT
< Content-Type: text/html
< Content-Length: 184
< Connection: keep-alive
< Location: http://trimet.ru/contacts
<
* Ignoring the response-body
* Connection #1 to host trimet.ru left intact
* Issue another request to this URL: 'http://trimet.ru/contacts'
* Re-using existing connection! (#0) with host trimet.ru
* Connected to trimet.ru (2a03:6f00:1::5c35:6090) port 80 (#0)
> GET /contacts HTTP/1.1
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.1) Gecko/20090716 Ubuntu/9.04 (jaunty) Shiretoko/3.5.1
Host: trimet.ru
Accept: */*
Referer: https://trimet.ru/contacts
< HTTP/1.1 302 Moved Temporarily
I have a really strange issue with a cURL request. Basically I have some code that I use to wrap a legacy PHP app on Symfony2 (http://symfonybricks.com/it/brick/wrap-legacy-php-code-by-a-symfony2-application). They are basic cURL commands that should pull a webpage and display it. However, I have noticed the following :
If I make a query to an existing file, it's working fine
If I make a query to a file that doesn't exist, cURL time out (Because I have set up a time out of 8sec), and during that time, it freeze the whole server, I can't access my website from any other devices until it actually time out.
Here's my code :
/**
* #Route("/", defaults={"controller" = "index.php", "controller2" = "", "controller3" = ""})
* #Route("/{controller}", defaults={"controller2" = "index.php", "controller3" = ""})
* #Route("/{controller}/", defaults={"controller2" = "index.php", "controller3" = ""})
* #Route("/{controller}/{controller2}", defaults={"controller3" = ""})
* #Route("/{controller}/{controller2}/", defaults={"controller3" = ""})
* #Route("/{controller}/{controller2}/{controller3}", defaults={"controller3" = "index.php"})
*/
public function getLegacyResourceAction($controller, $controller2, $controller3, Request $request)
{
$path_to_legacy_code = "http://XXX/";
$originalController = $request->getPathInfo();
$originalQueryString = $request->getQueryString();
$url = "{$path_to_legacy_code}{$originalController}?{$originalQueryString}";
//open connection
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
//Timeout
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 8);
curl_setopt($ch, CURLOPT_TIMEOUT, 8);
curl_setopt($ch, CURLOPT_MAXREDIRS, 5);
$stderr = fopen("{$this->container->getParameter('kernel.root_dir')}/logs/curl.txt", "a");
curl_setopt($ch, CURLOPT_STDERR, $stderr);
//echo "Login stuff in ".$this->container->getParameter('kernel.root_dir')."/logs/curl.txt";
curl_setopt($ch, CURLOPT_COOKIESESSION, 0);
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt');
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt');
$result = curl_exec($ch);
if (false === $result) {
echo curl_error($ch);
exit;
}
curl_close($ch);
fclose($stderr);
$Response = new Response($result);
return $Response;
Here's the log I'm getting from a valid response :
* About to connect() to www.xxx.com port 80 (#0)
* Trying 178.32.223.113...
* connected
* Connected to www.xxx.com (178.32.223.113) port 80 (#0)
> GET /web/legacy/index.php HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:40.0) Gecko/20100101 Firefo$
Host: www.xxx.com
Accept: */*
Cookie: idto=116; PHPSESSID=a159rcjvh0fk6otukqqq9bkrd5
* additional stuff not fine transfer.c:1037: 0 0
* HTTP 1.1 or later with persistent connection, pipelining supported
< HTTP/1.1 200 OK
< Date: Sat, 05 Sep 2015 04:26:32 GMT
< Server: Apache/2.2.22 (Debian)
< X-Powered-By: PHP/5.5.28-1~dotdeb+7.1
< Expires: Thu, 19 Nov 1981 08:52:00 GMT
< Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
< Pragma: no-cache
< Vary: Accept-Encoding
< Transfer-Encoding: chunked
< Content-Type: text/html
<
* Connection #0 to host www.xxx.com left intact
* Closing connection #0
And here's the code from an invalid response :
* About to connect() to www.xxx.com port 80 (#0)
* Trying 178.32.223.113...
* connected
* Connected to www.xxx.com (178.32.223.113) port 80 (#0)
> GET /web/legacy/whatever.php HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:40.0) Gecko/20100101 Firefo$
Host: www.xxx.com
Accept: */*
Cookie: idto=116; PHPSESSID=a159rcjvh0fk6otukqqq9bkrd5
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* Operation timed out after 8001 milliseconds with 0 bytes received
* Closing connection #0
* About to connect() to www.xxx.com port 80 (#0)
* Trying 178.32.223.113...
* connected
* Connected to www.xxx.com (178.32.223.113) port 80 (#0)
> GET /web/legacy/legacy/whatever.php HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:40.0) Gecko/20100101 Firefo$
Host: www.xxx.com
Accept: */*
Cookie: idto=116; PHPSESSID=a159rcjvh0fk6otukqqq9bkrd5
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* additional stuff not fine transfer.c:1037: 0 0
* Operation timed out after 8001 milliseconds with 0 bytes received
* Closing connection #0
* About to connect() to www.xxx.com port 80 (#0)
* Trying 178.32.223.113...
* connected
* Connected to www.xxx.com (178.32.223.113) port 80 (#0)
> GET /web/legacy/legacy/legacy/whatever.php HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:40.0) Gecko/20100101 Firefo$
Host: www.xxx.com
Accept: */*
Everything that's bellow curl_exec($ch); is not executed until it time out, not giving me a chance to check for a 404 error or something. I've been searching for days, no luck so far.
Thanks a lot !
EDIT :
Okay, so I solved the issue by removing this line :
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt');
Now it give a proper 404 error. I have no idea why it's working without this line. The problem is, without this line I'm breaking some session stuff from my legacy code !
EDIT 2 :
I've changed :
curl_setopt($ch, CURLOPT_COOKIESESSION, 0);
to
curl_setopt($ch, CURLOPT_COOKIESESSION, 1);
And now it's working fine with 404 error and my session. I have NO IDEA why.
EDIT 3 :
Okay, I thing I finally understand what's going on :
On every cURL request, it write on the cookie.txt file
If someone query a 404 url, before the time out, it file lock the cookie.txt file, and that's why the whole thing seems to be frozen.
As of my understanding, the best solution would be to generate a different cookie.txt file for each user, preventing a file lock.
I finally got to the bottom of this, turns out I had two issues :
No timeout configured, so if I'd try to query a non-existing webpage, it would loop indefinitely.
I was using the same cookies file for everyone, and if someone was stuck in such loop as mentionned before, it would file lock the cookie file, preventing other users to access the site.
I'm trying to implement a functionnality like facebook, when you paste a link it's grabbing some information (h1, desc, images, ...) from the page and display them.
I already face several issues that I managed to fix (gzip, cookies, user agent, ...) but on this one I'm not sure what is blocking my request.
The link in question is http://www.mixcloud.com
Here is my PHP script:
protected function getContent()
{
$ch = curl_init();
$headers = array(
'Accept: */*',
// 'Accept-Encoding: gzip,deflate,sdch',
// 'Accept-Language: en-US,en;q=0.8,es;q=0.6,fr;q=0.4,pt;q=0.2',
// 'Cache-Control: no-cache',
// 'Connection: keep-alive'
);
$debug = TRUE;
// Set the request type
curl_setopt($ch, CURLOPT_VERBOSE, $debug);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');
curl_setopt($ch, CURLOPT_NOBODY, FALSE);
curl_setopt($ch, CURLOPT_URL, $this->url);
curl_setopt($ch, CURLOPT_USERAGENT, $this->userAgent);
curl_setopt($ch, CURLOPT_REFERER, $this->referrer);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_HEADER, $debug);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_ENCODING , 'gzip');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 GTB5');
curl_setopt($ch, CURLOPT_COOKIEJAR, '/tmp/cookies.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, '/tmp/cookies.txt');
$data = curl_exec($ch);
var_dump($data);die;
return curl_exec($ch);
}
Here is the verbose response:
* Adding handle: conn: 0x7f937504e400
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x7f937504e400) send_pipe: 1, recv_pipe: 0
* About to connect() to www.mixcloud.com port 80 (#0)
* Trying 46.23.65.210...
* Connected to www.mixcloud.com (46.23.65.210) port 80 (#0)
> GET / HTTP/1.1
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 GTB5
Host: www.mixcloud.com
Accept-Encoding: gzip
Referer: https://www.google.com.au
Accept: */*
< HTTP/1.1 403 Forbidden
* Server nginx/1.5.8 is not blacklisted
< Server: nginx/1.5.8
< Date: Tue, 18 Feb 2014 06:39:45 GMT
< Content-Type: text/html
< Transfer-Encoding: chunked
< Connection: keep-alive
< Vary: Accept-Encoding
< Content-Encoding: gzip
<
* Connection #0 to host www.mixcloud.com left intact
string(376) "HTTP/1.1 403 Forbidden\r\nServer: nginx/1.5.8\r\nDate: Tue, 18 Feb 2014 06:39:45 GMT\r\nContent-Type: text/html\r\nTransfer-Encoding: chunked\r\nConnection: keep-alive\r\nVary: Accept-Encoding\r\nContent-Encoding: gzip\r\n\r\n<html>\r\n<head><title>403 Forbidden</title></head>\r\n<body bgcolor="white">\r\n<center><h1>403 Forbidden</h1></center>\r\n<hr><center>nginx/1.5.8</center>\r\n</body>\r\n</html>\r\n"
Now if I try to execute the curl command in the shell it's working fine:
$ curl -i 'http://www.mixcloud.com' -v
* Adding handle: conn: 0x7fe28b004000
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x7fe28b004000) send_pipe: 1, recv_pipe: 0
* About to connect() to www.mixcloud.com port 80 (#0)
* Trying 46.23.65.210...
* Connected to www.mixcloud.com (46.23.65.210) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.30.0
> Host: www.mixcloud.com
> Accept: */*
>
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Date: Tue, 18 Feb 2014 06:41:30 GMT
Date: Tue, 18 Feb 2014 06:41:30 GMT
< Content-Type: text/html; charset=utf-8
Content-Type: text/html; charset=utf-8
< Content-Length: 194847
Content-Length: 194847
< Connection: keep-alive
Connection: keep-alive
< Vary: Accept-Encoding
Vary: Accept-Encoding
* Server gunicorn/0.17.4 is not blacklisted
< Server: gunicorn/0.17.4
Server: gunicorn/0.17.4
< Vary: Cookie, User-Agent, X-Requested-With, X-Ignore-Block
Vary: Cookie, User-Agent, X-Requested-With, X-Ignore-Block
< x-xss-protection: 1; mode=block
x-xss-protection: 1; mode=block
< x-content-type-options: nosniff
x-content-type-options: nosniff
< Set-Cookie: csrftoken=ciOosbUNp5EL8t5tiQQzkoeaJIDJ3VfO; Domain=.mixcloud.com; expires=Tue, 17-Feb-2015 06:41:30 GMT; Max-Age=31449600; Path=/
Set-Cookie: csrftoken=ciOosbUNp5EL8t5tiQQzkoeaJIDJ3VfO; Domain=.mixcloud.com; expires=Tue, 17-Feb-2015 06:41:30 GMT; Max-Age=31449600; Path=/
< Set-Cookie: eventstream=; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/
Set-Cookie: eventstream=; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/
<
<!DOCTYPE html> ...
I know that the cURL for PHP and cURL are different, but I can't see what I am missing.
Anyone?
Cheers,
Maxime
Ok I've found what was the issue. It was the user-agent.
It's really weird. I was using this user-agent:
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 GTB5
With this user agent I was getting a 403. I've updated it using the following one:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36
And it's now working well. I can't believe that people are still rejecting request for specific user agent...