json api output via bash curl vs output via php curl - php

environment:
Ubuntu 18.04.5 LTS
PHP 7.2.24-0ubuntu0.18.04.7
curl 7.58.0
general behavior:
I am using php cUrl to connect with a json rest api. Among others, the api spec shows a procedure to login.
on success, the api returns http level 200 and a success json string of something like
{
"token":"ey...RU",
"expiration":"2021-07-07T00:39:45"
}
On failure (bogus password, etc) it returns http level 401 and a json string of something like
{
"type": "https://tools.ietf.org/html/rfc7235#section-3.1",
"title": "Unauthorized",
"status": 401,
"traceId": "00-009af152168b684da1a6a543d11a3357-63fd91f4be1fc443-00"
}
i basically cloned the solution here How to curl via php, but without changes it returned 411 and no body. i experimented with different accept headers, without success. i also experimented with a content-length header to no avail. i finally settled on those in working bash curl method.
the jist:
the json string is expected in both the success case and the failure case. in actuality, this is true for command-line land, but only true in the case of php curl with correct credentials.
there are several other msgs in the api which return various http level failure codes and, with bash curl, the json string is returned - php curl does not.
actual results:
using either method with correct credentials, the success json string (giving token and expiration, shown above) is received.
if the credentials are not correct, bash curl returns the failure json string:
> curl -X POST "https://<api endpoint url>/api/Authenticate/login" \
-H "accept: text/plain" -H "Content-Type: application/json-patch+json" \
-d "{\"username\":\"valid_username\",\"password\":\"bogus_password\"}"
{"type":"https://tools.ietf.org/html/rfc7235#section-3.1",
"title":"Unauthorized",
"status":401,
"traceId":"00-a4edb6cfca5918499f1bfaa073206118-da71a7d85e582448-00"}
attempting to login with bogus credentials using php curl, it does not:
$curl = curl_init();
curl_setopt($curl, CURLOPT_CUSTOMREQUEST, 'POST');
curl_setopt($curl, CURLOPT_URL, '<the real json endpoint uri>);
curl_setopt($curl, CURLOPT_FAILONERROR, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_VERBOSE, true);
# curl_setopt($curl, CURLOPT_STDERR, $out);
$hdrs = ["accept: text/plain",
"Content-Type: application/json-patch+json"
];
curl_setopt($curl, CURLOPT_HTTPHEADER, $hdrs);
$params = ['username' => 'valid_username',
'password' => 'bogus password'
];
curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_POSTFIELDS, json_encode($params));
// json_encode sanitization omitted here for clarity
$output = curl_exec($curl);
$responseCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
if("200" != $responseCode)
$responseCode .= " - " . curl_error($curl);
$info = curl_getinfo($curl);
curl_close($curl);
print("==========\n\n");
printf("output: %s \n", var_export($output, true));
printf("responseCode: %s \n", $responseCode);
printf("info: %s \n", print_r($info, true));
print("==========\n\n");
the output:
NOTE no failure json body.
* Trying 2xx.xxx.xx.xx...
* TCP_NODELAY set
* Connected to <the real json endpoint uri> (2xx.xxx.xx.xx) port 443 (#0)
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
* subject: CN=*.<the real json endpoint uri>
* start date: Jun 22 15:31:01 2021 GMT
* expire date: Sep 20 15:31:00 2021 GMT
* subjectAltName: host "<the real json endpoint uri>" matched cert's "<the real json endpoint uri>"
* issuer: C=US; O=Let's Encrypt; CN=R3
* SSL certificate verify ok.
> POST /api/Authenticate/login HTTP/1.1
Host: <the real json endpoint uri>
accept: text/plain
Content-Type: application/json-patch+json
Content-Length: 73
* upload completely sent off: 73 out of 73 bytes
* The requested URL returned error: 401 Unauthorized
* stopped the pause stream!
* Closing connection 0
==========
output: false
responseCode: 401 - The requested URL returned error: 401 Unauthorized
info: Array
(
[url] => https://<the real json endpoint uri>
[content_type] =>
[http_code] => 401
[header_size] => 0
[request_size] => 237
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 0.625
[namelookup_time] => 0.060492
[connect_time] => 0.156226
[pretransfer_time] => 0.428025
[size_upload] => 73
[size_download] => 0
[speed_download] => 0
[speed_upload] => 116
[download_content_length] => -1
[upload_content_length] => 73
[starttransfer_time] => 0.624949
[redirect_time] => 0
[redirect_url] =>
[primary_ip] => 2xx.xxx.xx.xx
[certinfo] => Array
(
)
[primary_port] => 443
[local_ip] => 10.10.10.105
[local_port] => 42192
)
==========
the question:
how can php curl get and display the failure json response/body in the 401 case?

without changes it returned 411 and no body.
You have set CURLOPT_FAILONERROR to true, which means cURL will not return the request body any more on responses with status code >= 400.
Removing this, should get you the response body in either case, whether the status code indicated success, or some kind of failure.

Related

Curl - can't get the content of a page and/or post data on that page

I'm trying to get a webpage using curl but i get only a blank page, no output. Here is how i'm trying to do it:
curl_setopt($ch, CURLOPT_URL, 'https://example.com/b2b/');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 2);
curl_setopt($ch, CURLOPT_USERAGENT,"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 " );
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
After some research i tried to add this like:
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Accept-Encoding: gzip'));
And also this is how i'm trying to echo it out after adding the last line:
$response = curl_exec($ch);
$content = #gzdecode($response);
echo ($content !== false) ? $content : $response;
Am i doing something wrong? I mean this works if i change the url with another website's url :(.
P.S This is what i get if i print_r curl_getinfo():
Array
(
[url] => https://example.com/b2b/
[content_type] =>
[http_code] => 0
[header_size] => 0
[request_size] => 0
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 0
[namelookup_time] => 0
[connect_time] => 0
[pretransfer_time] => 0
[size_upload] => 0
[size_download] => 0
[speed_download] => 0
[speed_upload] => 0
[download_content_length] => -1
[upload_content_length] => -1
[starttransfer_time] => 0
[redirect_time] => 0
[redirect_url] =>
[primary_ip] =>
[certinfo] => Array
(
)
[primary_port] => 0
[local_ip] =>
[local_port] => 0
)
Thank you!
This is a more technical than practical answer but I'll explain what is happening here and why the requested webpage cannot be fetched by cURL.
Please note that this seems to be an edge case. It might work on your system while it does not work on other systems. See Symantec PKI Distrust for more information.
What is happening?
To see what is happing when making the cURL call one should enable CURLOPT_VERBOSE logging:
* Hostname [REDACTED] was found in DNS cache
* Trying [REDACTED]...
* TCP_NODELAY set
* Connected to [REDACTED] ([REDACTED]) port 443 (#0)
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* SSL certificate problem: unable to get local issuer certificate
* stopped the pause stream!
* Closing connection 0
From this we can conclude that the certificate used to issue the TLS certificate of the website cannot be found in the CA truststore of cURL (located in /etc/ssl/certs/ca-certificates.crt on this system).
Now one might wonder why this is the case. Well that is because of the distrust of the CA certificate that issues the certificate for this website. The website uses a RapidSSL TLS certificate, which is issued before the 1st of December 2017. This means it falls withing the distrust section of old RapidSSL certificates.
How can this be solved?
Well you cannot really do anything. It is up to the owner of the website to update their TLS certificates. They should really be doing this because Chrome will start throwing nasty errors real soon. (Errors should already be appearing in the M70 beta versions. After the 16th of October all releases [>M70] will throw big nasty errors.)
Except that you can bypass the SSL/TLS certificate checks in cURL.
I DO NOT RECOMMEND THIS, YOU SHOULD NEVER DISABLE THE CERTIFICATE CHECKS!
You can use
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
to disable the checks and after that cURL will return the webpage:
<?php
$url = "https://[REDACTED]";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0); // one should never do this
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0); // or this!!!
$output = curl_exec($ch);
curl_close($ch);
echo $output;
// all kinds of HTML and other things
?>
Conclusion
The requested website uses a certificate that will be distrusted/is already distrusted and therefore cURL cannot complete the TLS handshake to establish a secure connection to the website. This is all due to the Distrust of the Symantec PKI.
Please note (again) that one should never disable the security checks.

Curl HTTP Post Login and Redirect

I am trying to create a page to log in to a local router automatically. I am using CURL currently to log in to the page and authenticate. This part of the the code appears to be working correctly. The issue I am having is that once CURL has authenticated, I need to then redirect the user to this page so that they can navigate, however, I will also need to use the cookies collected by CURL.
Here is my code as it stands at the moment
$data = array(
'username' => 'admin',
'password' => 'admin',
);
$ch = #curl_init();
curl_setopt($ch, CURLOPT_URL,'http://192.168.69.1:65080/login.cgi');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Expect:'));
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_POSTFIELDS,$data);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIEFILE, 'public_html/cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEJAR, 'public_html/cookie.txt');
$result = curl_exec ($ch);
$info = curl_getinfo($ch);
curl_close ($ch);
print_r($result);
print_r($info);
//Working until this point
preg_match('/^Set-Cookie:\s*([^;]*)/mi', $result, $m);
parse_str($m[1], $cookies);
foreach($cookies as $key=>$cookie)
{
setcookie($key, $cookie, time() + 60*60*24*30, '/');
}
header("location:".$info['redirect_url']);
As you can see I found a snippet to loop through the $result info and then set them as cookies before redirecting, however, this is not working correctly and I am redirected to the login page not the index page.
If I do a further call before I close CURL, using the redirect url as the url, I do get a partial print of the index page, however, the important images etc are not displayed. But I need to be able to access the page and navigate rather than simply printing the page.
Here is a print of $result
HTTP/1.1 302 Found
Location: /index.cgi
Set-cookie: show_security_warning=deleted; expires=Sunday, 09-Jun-13 10:54:00 GMT
Set-cookie: ui_language=en_US; expires=Tuesday, 19-Jan-38 03:14:07 GMT
Content-Type: text/html
Transfer-Encoding: chunked
Date: Mon, 09 Jun 2014 10:54:01 GMT
Server: lighttpd/1.4.31
Here is a print of $info
Array
(
[url] => http://192.168.69.1:65080/login.cgi
[content_type] => text/html
[http_code] => 302
[header_size] => 314
[request_size] => 251
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 0.484
[namelookup_time] => 0
[connect_time] => 0
[pretransfer_time] => 0
[size_upload] => 255
[size_download] => 0
[speed_download] => 0
[speed_upload] => 526
[download_content_length] => -1
[upload_content_length] => 255
[starttransfer_time] => 0
[redirect_time] => 0
[certinfo] => Array
(
)
[redirect_url] => http://192.168.69.1:65080/index.cgi
)
Here is my cookie.txt
# Netscape HTTP Cookie File
# http://curl.haxx.se/docs/http-cookies.html
# This file was generated by libcurl! Edit at your own risk.
192.168.69.1 FALSE / FALSE 0 AIROS_SESSIONID d19e097a07b7b76fd7d90267a8e1f4d2
192.168.69.1 FALSE / FALSE 1370775278 show_security_warning deleted
192.168.69.1 FALSE / FALSE 2147483647 ui_language en_US
Finally here is a print of $cookies
Array
(
[show_security_warning] => deleted
)
If anyone can point me in the right direction of how to achieve the next step, I would be most grateful.
I'm not sure your strategy will ever be sucessfull.
Curl is working as a web client. It means Curl and your web browser are probably seen as distinct hosts by the router.
[CLIENT (WEB BROWSER)] ---HTTP---> [PHP WEBSERVER]
[CURL] ---HTTP---> [ROUTER (WEBSERVER)]
PHP has a particular behaviour : it stores sessions in files whose name depends on the session id cookie value only, so it is (or was ... I dont know all versions of PHP) possible to steal a session by capturing the session cookie / cloning the cookie values.
Not all CGI libs are doing the same. I believe your router has a safer session storage method, as it should be expected from a security dedicated device (for example
a key based on the client IP and the session cookie value).
In this case your method is useless.
You'd better to use a javascript based form (in order to post the id/password) and maybe an iframe requesting the router login page before (in order to initialize the routers cookie values). Using a javascript form will show the credentials to your user which is probably not what you want

CURL cannot run URL and return 302

I'm trying to run an URL (which have signout functionality) through the CURL. But it is returning 302 http code. Same url when i run through the POSTMAN ( Google Chrome addon ) or POSTER ( Firefox Addon) , then it is return proper result ( {"status" : "success" } ). Any help would be greatly appreciated.
URL (JAVA APPLICATION URL) : http://website.mywebsite.com:8083/VideoBook/signout.action
MY CODE :
// Open log file
$logfh = fopen("GeoserverPHP.log", 'w') or die("can't open log file");
// Initiate cURL session
$service = "http://website.mywebsite.com:8083/VideoBook/";
$request = "signout.action";
$url = $service . $request;
$ch = curl_init($url);
// Optional settings for debugging
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_STDERR, $logfh);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
curl_setopt($ch, CURLOPT_REFERER, true);
curl_setopt($ch, CURLOPT_COOKIEJAR, true);
curl_setopt($ch, CURLOPT_COOKIEFILE, true);
//Required GET request settings
// $passwordStr = "geosolutions:Geos";
// curl_setopt($ch, CURLOPT_USERPWD, $passwordStr);
//GET data
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Accept: application/json"));
//GET return code
$successCode = 200;
$buffer = curl_exec($ch);
echo "CURL INFO : <BR/> " ;
print_r(curl_getinfo($ch));
echo "CURL OUTPUT : <BR/> " ;
print_r($buffer);
// Check for errors and process results
$info = curl_getinfo($ch);
if ($info['http_code'] != $successCode) {
$msgStr = "# Unsuccessful cURL request to ";
$msgStr .= $url." [". $info['http_code']. "]\n";
fwrite($logfh, $msgStr);
} else {
$msgStr = "# Successful cURL request to ".$url."\n";
fwrite($logfh, $msgStr);
}
fwrite($logfh, $buffer."\n");
curl_close($ch);
fclose($logfh);
OUTPUT IN BROWSER :
CURL INFO :
Array
(
[url] => http://website.mywebsite.com:8083/VideoBook/signout.action
[content_type] =>
[http_code] => 302
[header_size] => 254
[request_size] => 105
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 0.58976
[namelookup_time] => 0.004162
[connect_time] => 0.297276
[pretransfer_time] => 0.297328
[size_upload] => 0
[size_download] => 0
[speed_download] => 0
[speed_upload] => 0
[download_content_length] => 0
[upload_content_length] => 0
[starttransfer_time] => 0.589739
[redirect_time] => 0
[redirect_url] => https://hpecp.mywebsite.com:8443/cas/login?service=http%3A%2F%2Fwebsite.mywebsite.com%3A8083%2FVideoBook%2Flogin.action
[primary_ip] => 125.21.227.2
[certinfo] => Array
(
)
[primary_port] => 8083
[local_ip] => 10.0.0.8
[local_port] => 50710
)
CURL OUTPUT :
LOG File Details :
* Hostname was NOT found in DNS cache
* Trying 125.21.227.2...
* Connected to website.mywebsite.com (125.21.227.2) port 8083 (#0)
> GET /VideoBook/signout.action HTTP/1.1
Host: website.mywebsite.com:8083
Accept: application/json
< HTTP/1.1 302 Moved Temporarily
* Server Apache-Coyote/1.1 is not blacklisted
< Server: Apache-Coyote/1.1
< Location: https://hpecp.mywebsite.com:8443/cas/login?service=http%3A%2F%2Fwebsite.mywebsite.com%3A8083%2FVideoBook%2Flogin.action
< Content-Length: 0
< Date: Tue, 20 May 2014 06:02:29 GMT
<
* Connection #0 to host website.mywebsite.com left intact
* Issue another request to this URL: 'https://hpecp.mywebsite.com:8443/cas/login?service=http%3A%2F%2Fwebsite.mywebsite.com%3A8083%2FVideoBook%2Flogin.action'
* Hostname was NOT found in DNS cache
* Trying 15.126.214.121...
* Connected to hpecp.mywebsite.com (15.126.214.121) port 8443 (#1)
* successfully set certificate verify locations:
* CAfile: none
CApath: /etc/ssl/certs
* Unknown SSL protocol error in connection to hpecp.mywebsite.com:8443
* Closing connection 1
# Unsuccessful cURL request to http://website.mywebsite.com:8083/VideoBook/signout.action [302]
try to add ssl verify false and follow location and now all set
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
//output:-
CURL INFO :
Array ( [url] => https://exampl.com:8443/cas/login?service=http%3A%2F%2Fexample%3A8083%2FVideoBook%2Flogin.action [content_type] => text/html;charset=UTF-8 [http_code] => 200 [header_size] => 593 [request_size] => 273 [filetime] => -1 [ssl_verify_result] => 18 [redirect_count] => 1 [total_time] => 3.073 [namelookup_time] => 0 [connect_time] => 0.577 [pretransfer_time] => 1.794 [size_upload] => 0 [size_download] => 8003 [speed_download] => 2604 [speed_upload] => 0 [download_content_length] => 8003 [upload_content_length] => -1 [starttransfer_time] => 2.387 [redirect_time] => 0.686 )
You so need to check auth credentials on your end
I think, adding these three parameter CURLOPT_REFERER, CURLOPT_COOKIEJAR, CURLOPT_COOKIEFILE and an valid cookie file can solve this. I didn't tested the code.
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
Do the job.
In order to log out of any kind of session, you first need to be logged in, so the service must be expecting some reference to an existing session.
Either it expects you to give it information about which user should be logged out, or it is intended to log your script out after a series of calls to other services.
What it cannot do is automatically log out the user who is accessing your page, because it has no way of seeing them. The request originates entirely on your server, and only contains the information you pass to it with CURL. Nor will you be able to give it the information a browser would have, unless your script is on the same domain, as the browser will not pass your script the cookies set by the other site.

PHP Curl GET call troubleshooting

I am building a website that makes standard HTTP calls to an API. My first call is a straight-forward GET with no parameters using basic auth. I am using Curl in my php. I am running this via a local install of XAMPP. My call is not working but if i have a colleague run the php on his Linux box running an older version of ubuntu PHP it works fine. What is the best way to troubleshoot this issue? My guess is it is something with my XAMPP install but is there a good method for troubleshooting? I have used curl_getinfo on my curl session to get the return values and it doesn't seem to provide much insight as far as I can tell.
Here is the curl_getinfo output:
Array (
[url] => https://www.ebusservices.net/webservices/hrpcertws/rbs/api/merchants/267811683882/consumers.xml?
[content_type] =>
[http_code] => 0
[header_size] => 0
[request_size] => 107
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 0.28
[namelookup_time] => 0.015
[connect_time] => 0.015
[pretransfer_time] => 0
[size_upload] => 0
[size_download] => 0
[speed_download] => 0
[speed_upload] => 0
[download_content_length] => -1
[upload_content_length] => -1
[starttransfer_time] => 0
[redirect_time] => 0
[certinfo] => Array ( )
[primary_ip] => 127.0.0.1
[primary_port] => 8888
[local_ip] => 127.0.0.1
[local_port] => 59509
[redirect_url] =>
)
I am using:
XAMPP 1.8.1
PHP Version 5.4.7
cURL 7.24.0
on Windows 7
Added Code:
<?php
error_reporting(E_ALL);
$session = 'FALSE';
// Initialize the session
$session = curl_init();
$stderr = fopen("curl.txt", "w+");
// Set curl options
curl_setopt($session, CURLOPT_URL, 'https://www.ebusservices.net/webservices/hrpcertws/rbs/api/merchants/12233442/consumers.xml?');
curl_setopt($session, CURLOPT_STDERR, $stderr);
curl_setopt($session, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
curl_setopt($session, CURLOPT_USERPWD, "username:pwd");
curl_setopt($session, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($session, CURLOPT_SSLVERSION, 3);
curl_setopt($session, CURLOPT_VERBOSE, 1);
// Make the request
$response = curl_exec($session);
print_r(curl_getinfo($session));
// Close the curl session
curl_close($session);
fclose($stderr);
// Get HTTP Status code from the response
$status_code = array();
preg_match('/\d\d\d/', $response, $status_code);
// Check the HTTP Status code
if(isset($status_code[0]))
{
switch( $status_code[0] )
{
case 100:
break;
case 200:
break;
case 503:
die('Your call to HRP API failed and returned an HTTP status of 503. That means: Service unavailable. An internal problem prevented us from returning data to you.');
break;
case 403:
die('Your call to HRP API failed and returned an HTTP status of 403. That means: Forbidden. You do not have permission to access this resource, or are over your rate limit.');
break;
case 400:
die('Your call to HRP API failed and returned an HTTP status of 400. That means: Bad request. The parameters passed to the service did not match as expected. The exact error is returned in the XML response.');
break;
case 401:
die('Your call to HRP API failed and returned an HTTP status of 401. That means: Unauthorized. The credentials supplied do not have permission to access this resource.');
break;
case 404:
die('Page not found.');
break;
default:
die('Your call to HRP API returned an unexpected HTTP status of:' . $status_code[0]);
}
}
else
{
echo 'failed';
}
// Get the XML from the response, bypassing the header
if (!($xml = strstr($response, '<?xml'))) {
$xml = null;
//echo 'in xml';
}
// Output the XML
echo htmlspecialchars($xml, ENT_QUOTES);
?>
Try using Fiddler to see exactly what is in the HTTP traffic.

Curl Errors While Pulling from Twitter

I'm trying to pull some data from twitter via PHP. I'm using the tmhOAuth plugin, which can be found here. https://github.com/themattharris/tmhOAuth/
I wrote my code based off the example file "streaming.php", which can also be found on the above github page. Here is my code:
require 'tmhOAuth.php';
$tmhOAuth = new tmhOAuth(array(
'consumer_key' => 'xxxhiddenxxx',
'consumer_secret' => 'xxxhiddenxxx',
'user_token' => 'xxxhiddenxxx',
'user_secret' => 'xxxhiddenxxx'
));
$method = 'http://stream.twitter.com/1/statuses/filter.json';
$params = array(
'follow' => '1307392917',
'count' => '5'
);
$tmhOAuth->streaming_request('POST', $method, $params, 'my_streaming_callback');
$tmhOAuth->pr($tmhOAuth);
That was not printing out any of the twitter data I wanted to pull, and was only showing the debug information that the pr() command writes.
While trying to debug why I wasn't getting any data, I went in and added a line to tmhOAuth.php so that I could see what error cURL was giving. I did this by using
echo curl_error($C);
The error that cURL outputed was :
transfer closed with outstanding read data remaining
I've done some research on that error, but I can't find anything that helps. There were a couple things that I found regarding content-length, but when I dug into the code I saw that the author of tmhOAuth had already addressed those issues (and commenting out his fixes didn't help).
Any help?
Update 1 Here is the response info gathered using curl_getinfo:
//Removed - an updated version is below
Update 2 Thanks to the comments below I realized that twitter was sending me data with transfer-encoding: chunked. I put this line into tmhOAuth.php to force out chunked data:
curl_setopt($c, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_0);
That worked, somewhat. I'm no longer getting any cURL errors, but my WRITEFUNCTION callback is still never getting called - so I'm never getting any actual data. Here's the output of my cURL object again:
[response] => Array
(
[content-length] => 0
[headers] => Array
(
[content_type] => text/html; charset=iso-8859-1
[server] => Jetty(6.1.25)
)
[code] => 416
[response] => 1
[info] => Array
(
[url] => http://stream.twitter.com/1/statuses/filter.json
[content_type] => text/html; charset=iso-8859-1
[http_code] => 416
[header_size] => 116
[request_size] => 532
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 0.118553
[namelookup_time] => 0.043927
[connect_time] => 0.070477
[pretransfer_time] => 0.07049
[size_upload] => 25
[size_download] => 0
[speed_download] => 0
[speed_upload] => 210
[download_content_length] => -1
[upload_content_length] => -1
[starttransfer_time] => 0.118384
[redirect_time] => 0
[request_header] => POST /1/statuses/filter.json HTTP/1.0
User-Agent: themattharris' HTTP Client
Host: stream.twitter.com
Accept: */*
Authorization: OAuth oauth_consumer_key="xxxhiddenxxx", oauth_nonce="xxxhidden", oauth_signature="xxxhidden", oauth_signature_method="HMAC-SHA1", oauth_timestamp="1308226585", oauth_token="xxxhiddenxxx", oauth_version="1.0"
Content-Length: 25
Content-Type: application/x-www-form-urlencoded
)
)
)
Update 3: Couple things I've figured out so far... I removed the 'count' parameters from my POST request, and now the page seems to take forever. I figured this meant it was just downloading tons and tons of data, so I put a break into the streaming callback function, setup so that it kills the page after 5 loops.
I did this, and let it sit for quite awhile. After about 5 minutes, the page finished loading, and showed me what data I had gathered. It looked like I had gotten no data each time it ran through - only an end of line character. So, it's taking a minute for every piece of data I am downloading, and even then the only data that shows is an end of line character. Weird? Is this a twitter issue or a cURL issue?
I tried with the token api but never got something good, so this is the script I found here :
<?php
/**
* API Streaming for Twitter.
*
* #author Loïc Gerbaud <gerbaudloic#gmail.com>
* #version 0.1 "itjustworks"
*/
define('TWITTER_LOGIN','login'); //login twitter
define('TWITTER_PASSWORD','myp4ssw0rd'); //password twitter
$sTrackingList = 504443371;//read my account but could be keywords
// ?
while(1){
echo 'Connexion ';
read_the_stream($sTrackingList);
echo 'Deconnexion ';
}
/**read the stream
*
*/
function read_the_stream($sTrackingList){
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,'https://stream.twitter.com/1/statuses/filter.json');
curl_setopt($ch,CURLOPT_USERPWD,TWITTER_LOGIN.':'.TWITTER_PASSWORD);//Le couple login:password
curl_setopt($ch, CURLOPT_NOBODY, 0);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_USERAGENT, '');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('X-Twitter-Client: ItsMe','X-Twitter-Client-Version: 0.1','X-Twitter-Client-URL: http://blog.loicg.net/'));
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS,"follow=".$sTrackingList);//read the doc for your request
curl_setopt($ch, CURLOPT_WRITEFUNCTION, 'write_callback');//function callback
curl_exec($ch);
curl_close($ch);
}
/** a demo with a writting log or put in MySQL
*/
function write_callback($ch, $data) {
if(strlen($data)>2){
$oData = json_decode($data);
if(isset($oData->text)){
file_put_contents('log',$oData->text."\n",FILE_APPEND);
}
}
return strlen($data);
}
?>
run this script in your browser (you can close it after), update your twitter account and check the .log
After about 5 minutes, the page finished loading
Are you running streaming.php in the browser? If so, you have to run it via ssl, otherwise it doesn't work. I have a server chron job pointing to the file but you can do it also with the terminal:
php /path/to/here/streaming.php
For view the data you are getting, you can store it into a database or log:
function my_streaming_callback($data, $length, $metrics) {
$ddf = fopen('/twitter/mydata.log','a');
fwrite($ddf,$data);
fclose($ddf);
}

Categories