Trying to get image file size using curl but content length header is not returned:
$url ="http://www.collegefashion.net/wp-content/plugins/feed-comments-number/image.php?1263";
$fp = curl_init();
curl_setopt($fp, CURLOPT_NOBODY, true);
curl_setopt($fp, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($fp, CURLOPT_FAILONERROR,1);
curl_setopt($fp, CURLOPT_REFERER,'');
curl_setopt($fp, CURLOPT_URL, $url);
curl_setopt($fp, CURLOPT_HEADER,1);
curl_setopt($fp, CURLOPT_USERAGENT,'Mozilla/5.0');
$body = curl_exec($fp);
var_dump($body):
HTTP/1.1 200 OK
Date: Sun, 02 May 2010 02:50:20 GMT
Server: Apache/2.0.63 (CentOS)
X-Powered-By: W3 Total Cache/0.8.5.2
X-Pingback: http://www.collegefashion.net/xmlrpc.php
Cache-Control: no-cache, must-revalidate
Expires: Sat, 26 Jul 1997 05:00:00 GMT
Content-Type: image/png
It works via ssh though:
curl -i http://www.collegefashion.net/wp-content/plugins/feed-comments-number/image.php?1263
HTTP/1.1 200 OK
Date: Sun, 02 May 2010 03:38:43 GMT
Server: Apache/2.0.63 (CentOS)
X-Powered-By: W3 Total Cache/0.8.5.2
X-Pingback: http://www.collegefashion.net/xmlrpc.php
Cache-Control: no-cache, must-revalidate
Expires: Sat, 26 Jul 1997 05:00:00 GMT
Content-Length: 347
Content-Type: image/png
CURLOPT_NOBODY makes a HEAD request while your command line with -i is a GET request...
If you'd use -I with your command line version they would be more similar.
Check curl_getinfo():
$size = curl_getinfo($fp, CURLINFO_CONTENT_LENGTH_DOWNLOAD);
Execute it after curl_exec().
One other option is to set CURLOPT_HEADER to false and just do strlen($body) -- ignore this, I didn't noticed you were using CURLOPT_NOBODY.
Related
Code:
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, "https://detail.1688.com/offer/543783250479.html?sk=consign");
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$a = curl_exec($ch);
$data = curl_exec($ch);
curl_close($ch);
var_dump($data);
Result:
string(339) "HTTP/1.1 403 Forbidden
Server: nginx/1.11.1
Date: Tue, 16 May 2017 03:46:32 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 169
Connection: keep-alive
<html>
<head><title>403 Forbidden</title></head>
<body bgcolor="white">
<center><h1>403 Forbidden</h1></center>
<hr><center>nginx/1.11.1</center>
</body>
</html>
"
While I run curl -I https://detail.1688.com/offer/543783250479.html?sk=consign in my shell, it returns 200:
HTTP/1.1 200 OK
Date: Tue, 16 May 2017 03:46:51 GMT
Content-Type: text/html;charset=GBK
Connection: keep-alive
Vary: Accept-Encoding
Expires: Thu, 01-Jan-1970 00:00:00 GMT
Cache-Control: max-age=0,s-maxage=0
b2c_auction: 543783250479
atp_isdpp: 99vb2b-2295161471
page_cache_info: {"is_new":true,"create_time":"2017-05-16T11:46:51","expire_time":3600000}
X-Cache: MISS TCP_REFRESH_MISS dirn:-2:-2
Via: aserver010103196008.et2[69,200-0,M]
url-hash: id=543783250479&detail.1688.com
Server: Tengine/Aserver
Strict-Transport-Security: max-age=31536000
Timing-Allow-Origin: *
EagleEye-TraceId: 0b83e0c214949064118297808e926
Could anyone please give me some hints about why I get 403 by cURL in PHP?
Environment:
Copyright (c) 1997-2016 The PHP Group
Zend Engine v3.0.0, Copyright (c) 1998-2016 Zend Technologies
with Zend OPcache v7.0.8-2+deb.sury.org~xenial+1, Copyright (c) 1999-2016, by Zend Technologies
with blackfire v1.10.6, https://blackfire.io, by Blackfireio Inc.
The returning headers without using an useragent:
HTTP/1.1 302 Moved Temporarily
Date: Tue, 16 May 2017 04:15:46 GMT
Content-Type: text/html
Content-Length: 266
Connection: keep-alive
Location: http://127.0.0.1/?sk=consign
X-Cache: MISS TCP_MISS dirn:-2:-2
Via: aserver010103196008.et2[0,302-0,M]
url-hash: id=543783250479&detail.1688.com
Server: Tengine/Aserver
Strict-Transport-Security: max-age=31536000
Timing-Allow-Origin: *
EagleEye-TraceId: 0b83dc9c14949081466171756eb58d
The important part is:
Location: http://127.0.0.1/?sk=consign
if I use an useragent, I get:
HTTP/1.1 200 OK
Date: Tue, 16 May 2017 04:17:30 GMT
Content-Type: text/html;charset=GBK
Transfer-Encoding: chunked
Connection: keep-alive
Vary: Accept-Encoding
Expires: Thu, 01-Jan-1970 00:00:00 GMT
Cache-Control: max-age=0,s-maxage=0
b2c_auction: 543783250479
atp_isdpp: 99vb2b-2295161471
page_cache_info: {"is_new":true,"create_time":"2017-05-16T12:17:30","expire_time":3600000}
X-Cache: MISS TCP_MISS dirn:-2:-2
Via: aserver011128044194.eu13[106,200-0,M]
url-hash: id=543783250479&detail.1688.com
Server: Tengine/Aserver
Strict-Transport-Security: max-age=31536000
Timing-Allow-Origin: *
EagleEye-TraceId: 0b802cd414949082503341644e23a0
Which is correct and it returns the desired html
Code used:
$url = "http://detail.1688.com/offer/543783250479.html?sk=consign";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:53.0) Gecko/20100101 Firefox/53.0");
$html = curl_exec($ch);
curl_close($ch);
print $html;
In curl -I stands for head request. Try setting the head request in your php as
:
curl_setopt($ch, CURLOPT_NOBODY, true);
And give it a try
I'm working with an API, using cURL I have received a set of data.
The data appears to be half HTTP request and half JSON. I'm not sure why it's mixed but essentially I get this response when I do a var_dump:
string(873) "HTTP/1.1 200 OK cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0 content-length: 153 content-type: application/json;charset=utf-8 date: Mon, 10 Nov 2014 10:58:49 UTC expires: Tue, 31 Mar 1981 05:00:00 GMT last-modified: Mon, 10 Nov 2014 10:58:49 GMT ml: A pragma: no-cache server: tsa_b set-cookie: guest_id=v1%3A141561712923128379; Domain=.twitter.com; Path=/; Expires=Wed, 09-Nov-2016 10:58:49 UTC status: 200 OK strict-transport-security: max-age=631138519 x-connection-hash: 57175e4dba3d726bebb399072c225958 x-content-type-options: nosniff x-frame-options: SAMEORIGIN x-transaction: 2e4b8e053e615c75 x-ua-compatible: IE=edge,chrome=1 x-xss-protection: 1; mode=block {"token_type":"bearer","access_token":"AAAAAAAAAAAAAAAAAAAAAMVfbQAAAAAAK7qYRQOgdZ771TrJ6pZ7nugCwVQ%3DLKcongtwy3lcBDbPSEreC9DfhJk3Gm7qyQInqhFAxYvo1clv4S"}"
That's the full data back. It's got HTTP info at the beginning and then part JSON at the end.
The only bit I need from this is the access_token data.
If it was just JSON then I could use json_decode to get the access_token out but because it's got all the HTTP info at the beginning json_decode cannot understand it and gives the result NULL.
How can I remove the HTTP part so I can just grab the access_token data?
ETA: my request is made through cURL, so the var I'm dumping out is $response
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$auth_url);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, "grant_type=client_credentials");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$header = curl_setopt($ch, CURLOPT_HEADER, 1);
$result = curl_exec($ch);
curl_close($ch);
The result I receive roughly matches the expected result given in the Twitter documentation so I don't think the data is corrupt/incorrect: https://dev.twitter.com/oauth/reference/post/oauth2/token
Switch of header output and remove
$header = curl_setopt($ch, CURLOPT_HEADER, 1);
or replace with
curl_setopt($ch, CURLOPT_HEADER, false);
$a='HTTP/1.1 200 OK cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0 content-length: 153 content-type: application/json;charset=utf-8 date: Mon, 10 Nov 2014 10:58:49 UTC expires: Tue, 31 Mar 1981 05:00:00 GMT last-modified: Mon, 10 Nov 2014 10:58:49 GMT ml: A pragma: no-cache server: tsa_b set-cookie: guest_id=v1%3A141561712923128379; Domain=.twitter.com; Path=/; Expires=Wed, 09-Nov-2016 10:58:49 UTC status: 200 OK strict-transport-security: max-age=631138519 x-connection-hash: 57175e4dba3d726bebb399072c225958 x-content-type-options: nosniff x-frame-options: SAMEORIGIN x-transaction: 2e4b8e053e615c75 x-ua-compatible: IE=edge,chrome=1 x-xss-protection: 1; mode=block {"token_type":"bearer","access_token":"AAAAAAAAAAAAAAAAAAAAAMVfbQAAAAAAK7qYRQOgdZ771TrJ6pZ7nugCwVQ%3DLKcongtwy3lcBDbPSEreC9DfhJk3Gm7qyQInqhFAxYvo1clv4S"}"';
preg_match("/\{.*\}/",$a,$m);
$ja=json_decode($m[0]);
var_dump($ja,$m);
output:
object(stdClass)[1]
public 'token_type' => string 'bearer' (length=6)
public 'access_token' => string 'AAAAAAAAAAAAAAAAAAAAAMVfbQAAAAAAK7qYRQOgdZ771TrJ6pZ7nugCwVQ%3DLKcongtwy3lcBDbPSEreC9DfhJk3Gm7qyQInqhFAxYvo1clv4S' (length=112)
I use this script in two different servers:
function curlGetFileInfo($url, $cookies="default"){
global $_config;
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_COOKIEFILE, 'serverpath/cookies/'.$cookies.'.txt');
$data = curl_exec($ch);
curl_close($ch);
if ($data === false) {
return 0;
}
//echo $data;
$info['filename'] = get_between($data, 'filename="', '"');
$info['extension'] = end(explode(".",$info['filename']));
if (preg_match('/Content-Length: (\d+)/', $data, $matches)) {
$info['filesize'] = (int)$matches[1];
}
return $info;
}
These servers have the same PHP version with the same PHP-Curl version. These are the two different headers of the curl result:
Working one:
HTTP/1.1 302 Found Date: Tue, 12 Jun 2012 07:04:35 GMT Server:
Apache/2.2.16 (Debian) X-Powered-By: PHP/5.3.3-7+squeeze13 Expires:
Thu, 19 Nov 1981 08:52:00 GMT Cache-Control: no-store,
no-cache,must-revalidate, post-check=0, pre-check=0 Pragma: no-cache
Location:
http://stor1076.uploaded.to/dl/b3411ded-0f45-4efc-b705-8c8ac89b5e41
Vary: Accept-Encoding Connection: close Content-Type: text/html
HTTP/1.1 200 OK Server: nginx/1.0.5 Date: Tue, 12 Jun 2012 07:04:35
GMT Content-Type: video/x-msvideo Content-Length: 733919232
Last-Modified: Tue, 29 May 2012 15:10:07 GMT Connection: keep-alive
Content-Disposition: attachment;
filename="Saw.[Spanish.DVDRip].[XviD-Mp3].by.SDG.avi" Accept-Ranges:
bytes
Non working one:
HTTP/1.1 302 Found Date: Tue, 12 Jun 2012 07:05:26 GMT Server:
Apache/2.2.16 (Debian) X-Powered-By: PHP/5.3.3-7+squeeze13 Expires:
Thu, 19 Nov 1981 08:52:00 GMT Cache-Control: no-store, no-cache,
must-revalidate, post-check=0, pre-check=0 Pragma: no-cache Location:
http://stor1164.uploaded.to/dl/22c3d242-365d-4e1e-b903-f1e2b81812c2
Vary: Accept-Encoding Connection: close Content-Type: text/html
Cookies are set OK (with login), and other simple Curl functions are working fine.
Also, I did a curl_getinfo($ch, CURLINFO_HTTP_CODE) and give me that result:
Working one:
200
Non working one:
302
Any idea?
On the working one you seem to be running Apache as well as nginx. You can see there are two HTTP responses:
HTTP/1.1 302 Found Date: Tue, 12 Jun 2012 07:04:35 GMT Server:
Apache/2.2.16 (Debian) HTTP/1.1 200 OK Server: nginx/1.0.5
So, your setup differs. I don't know how exactly they are running together, but this gives some insight and may help you solve it: http://kbeezie.com/view/apache-with-nginx/
Ok, it was a open_basedir problem. Thanks guys.
I am using this to grab a XML feed and the HTTP headers
// Initiate the curl session
$ch = curl_init();
// Set the URL
curl_setopt($ch, CURLOPT_URL, $url);
// Allow the headers
curl_setopt($ch, CURLOPT_HEADER, true);
// Return the output instead of displaying it directly
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Execute the curl session
$output = curl_exec($ch);
// Close the curl session
curl_close($ch);
// Cache feed
file_put_contents($filename, $output);
// XML to object
$output = simplexml_load_string($output);
// Return the output as an array
return $output;
And it returns these headers
HTTP/1.1 200 OK
Cache-Control: public, max-age=30
Content-Length: 5678
Content-Type: text/xml
Expires: Tue, 22 Nov 2011 15:12:16 GMT
Last-Modified: Tue, 22 Nov 2011 15:11:46 GMT
Vary: *
Server: Microsoft-IIS/7.0
Set-Cookie: ASP.NET_SessionId=1pfijrmsqndn5ws3124csmhe; path=/; HttpOnly
Data-Last-Updated: 11/22/2011 15:11:45
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Date: Tue, 22 Nov 2011 15:11:46 GMT
I only want it to return one part of the HTTP header, which is
Expires: Tue, 22 Nov 2011 15:12:16 GMT
and save that to a variable, how do I do this? I have been looking through the PHP manual but can't work it out
preg_match('~^(Expires:.+?)$~ism', $headers, $result);
return $result[1];
You can use PHP's get_headers($url, 1) function, which returns all the header information in an array, and if you set the second optional param to 1 (non-zero), then it parses it into an associative array like:
array(
[...] => ...,
[Expires] => 'what you need',
[...] => ...
)
http://php.net/manual/en/function.get-headers.php
While I am posting XML content from one server to other server, it is not getting added.
I'm using cURL to post the xml files to another server. But I am getting the following response:
HTTP/1.1 200 OK
Date: Thu, 21 Jul 2011 08:13:02 GMT
Server: Apache/2.2.8 (Ubuntu) PHP/5.2.4-2ubuntu5.6 with Suhosin-Patch mod_ssl/2.2.8 OpenSSL/0.9.8g
X-Powered-By: PHP/5.2.4-2ubuntu5.6
Set-Cookie: PHPSESSID=6846cb7e65f6f6d6d87f163a681f0543; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Content-Length: 5721
Content-Type: text/html; charset=UTF-8
This is my code
$file_path= WWW_ROOT.$xmlfilename;
$xmldata = file_get_contents($file_path);
$request = 'http://www.sample.com/someaction';
$postargs = 'xml='.urlencode($xmldata).'&filename='.urlencode($xmlfilename);
// Get the curl session object
$session = curl_init($request);
// Set the POST options.
curl_setopt($session, CURLOPT_POST, true);
curl_setopt($session, CURLOPT_POSTFIELDS, $postargs);
curl_setopt($session, CURLOPT_HEADER, true);
curl_setopt($session, CURLOPT_RETURNTRANSFER, true);
// Do the POST and then close the session
$response = curl_exec($session);
print_r( $response);
Note: allow_url_fopen and curl are enabled in both servers.
Try assigning it like this:
$postargs = array('xml' => urlencode($xmldata), 'filename' => urlencode($xmlfilename))
Both items should then appear in $_POST['xml'] and $_POST['filename'] in the receiving side (or equivalent if not PHP).
EDIT
OK you may need to look at streaming the XML file using CURLOPT_READFUNCTION.
See this for a bit of an example http://zingaburga.com/2011/02/streaming-post-data-through-php-curl-using-curlopt_readfunction/