Get http-statuscode without body using cURL? - php

I want to parse a lot of URLs to only get their status codes.
So what I did is:
$handle = curl_init($url -> loc);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
curl_setopt($handle, CURLOPT_HEADER , true); // we want headers
curl_setopt($handle, CURLOPT_NOBODY , true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$response = curl_exec($handle);
$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
curl_close($handle);
But as soon as the "nobody"-option is set to true, the returned status codes are incorrect (google.com returns 302, other sites return 303).
Setting this option to false is not possible because of the performance loss.
Any ideas?

The default HTTP request method for curl is GET. If you want only the response headers, you can use the HTTP method HEAD.
curl_setopt($handle, CURLOPT_CUSTOMREQUEST, 'HEAD');
According to #Dai's answer, the NOBODY is already using the HEAD method. So the above method will not work.
Another option would be to use fsockopen to open a connection, write the headers using fwrite. Read the response using fgets until the first occurrence of \r\n\r\n to get the complete header. Since you need only the status code, you just need to read the first 13 characters.
<?php
$fp = fsockopen("www.google.com", 80, $errno, $errstr, 30);
if ($fp) {
$out = "GET / HTTP/1.1\r\n";
$out .= "Host: www.google.com\r\n";
$out .= "Accept-Encoding: gzip, deflate, sdch\r\n";
$out .= "Accept-Language: en-GB,en-US;q=0.8,en;q=0.6\r\n";
$out .= "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36\r\n";
$out .= "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\r\n";
$out .= "Connection: Close\r\n\r\n";
fwrite($fp, $out);
$tmp = explode(' ', fgets($fp, 13));
echo $tmp[1];
fclose($fp);
}

cURL's nobody option has it use the HEAD HTTP verb, I'd wager the majority of non-static web applications I the wild don't handle this verb correctly, hence the problems you're seeing with different results. I suggest making a normal GET request and discarding the response.

i suggest get_headers() instead:
<?php
$url = 'http://www.example.com';
print_r(get_headers($url));
print_r(get_headers($url, 1));
?>

Related

Can't make Curl POST request from web app

A server I am working on appears to be denying outbound HTTP requests. The reason I think this is because I've tried both Guzzle and curl requests to the API.
The API lives on the same domain as the web server (this is temporary at clients request). I can make requests to the API server via Postman (Chrome plugin), but when I run that same request on the server, it doesn't return anything.
Here are the headers from the 'Postman' request:
POST /api2/user/session HTTP/1.1
Host: example.com
Connection: keep-alive
Content-Length: 49
Cache-Control: no-cache
Origin: chrome-extension://fdmmgilgnpjigdojojpjoooidkmcomcm
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.76 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Accept: */*
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.8
Cookie: PHPSESSID=d9ad79c4c0822fc5c86f4d8799307f1b; _ga=GA1.2.1674422587.1425409444
Post data:
token=a559d5bba5a9e9517d5c3ed7aeb62db6&user=30972
This works. It returns the data. But when I call the same endpoint from within my web app, I get nothing.
$data = urlencode("token=a559d5bba5a9e9517d5c3ed7aeb62db6&user=30972");
$ch = curl_init('http://example.com/api2/user/session');
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Content-Type: application/x-www-form-urlencoded',
'Content-Length: ' . strlen($data))
);
$result = curl_exec($ch);
What I don't understand is I can run the following, and it returns the content:
print file_get_contents("http://www.google.com");
When I var_dump the $_POST fields on the endpoint user/session it returns the array of postdata using Postman but $_POST fields are blank when sending via the web app. Even before it makes any request to the database, the post fields should be set right?
Via SSH this also works:
curl -F token=a559d5bba5a9e9517d5c3ed7aeb62db6 -F user=30972 http://example.com/api2/user/session
As suggested in comments I've tried:
var_dump(function_exists('curl_version'));
// bool(true)
I can't figure out what's going on.
Edit: This works ... but I don't want to use sockets. Must be a curl issue.
$fp = fsockopen('example.com', 80);
$vars = array(
'token' => 'a559d5bba5a9e9517d5c3ed7aeb62db6',
'user' => '30972'
);
$content = http_build_query($vars);
fwrite($fp, "POST /api2/user/session HTTP/1.1\r\n");
fwrite($fp, "Host: example.com\r\n");
fwrite($fp, "Content-Type: application/x-www-form-urlencoded\r\n");
fwrite($fp, "Content-Length: ".strlen($content)."\r\n");
fwrite($fp, "Connection: close\r\n");
fwrite($fp, "\r\n");
fwrite($fp, $content);
header('Content-type: text/plain');
while (!feof($fp)) {
echo fgets($fp, 1024);
}
Edit:
curl_error() also returns no error.
To better understand the differences between the PHP code and cURL, I created a RequestBin instance and tried both on it. They yielded drastically different results:
It seemed like the POST data from the PHP script yielded an incorrect result for what was sent. This can be fixed by using a built-in PHP function http_build_query.
It will yield a more apt result:
This can be caused by a session lock... If you use curl to access the same server, the same session is used. While the script is running, the session is locked by default, this means that the current request has to finish before another is handled for the same session. This would explain a timeout of the request in curl, as your first request is not completed and another is made...
Using session_write_close() before the curl_exec will unlock the session and correct the problem.
It turns out I needed to use http_build_query.
$vars = array(
'token' => 'a559d5bba5a9e9517d5c3ed7aeb62db6',
'user' => '30972'
);
$content = http_build_query($vars);
$ch = curl_init('http://example.com/api2/user/session');
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Content-Type: application/x-www-form-urlencoded',
'Content-Length: ' . strlen($content))
);
$result = curl_exec($ch);

How to avoid "HTTP/1.1 999 Request denied" response from LinkedIn?

I'm making request to LinkedIn page and receiving "HTTP/1.1 999 Request denied" response.
I use AWS/EC-2 and get this response.
On localhost everything works fine.
This is sample of my code to get html-code of the page.
<?php
error_reporting(E_ALL);
$url= 'https://www.linkedin.com/pulse/5-essential-strategies-digital-michelle';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$response = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
var_dump($response);
var_dump($info);
I don't need whole page content, just meta-tags (title, og-tags).
Note that the error 999 don't exist in W3C Hypertext Transfer Protocol - HTTP/1.1, probably this error is customized (sounds like a joke)
LinkedIn don't allow direct access, the probable reason of them blocking any "url" from others webservers access should be to:
Prevent unauthorized copying of information
Prevent invasions
Prevent abuse of requests.
Force use API
Some IP addresses of servers are blocked, as the "IP" from "domestic ISP" are not blocked and that when you access the LinkedIn with web-browser you use the IP of your internet provider.
The only way to access the data is to use their APIs. See:
Accessing LinkedIn public pages using Python
Heroku requests return 999
Note: The search engines like Google and Bing probably have their IPs in a "whitelist".
<?php
header("Content-Type: text/plain");
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://www.linkedin.com/company/technistone-a-s-");
$header = array();
$header[] = "Host: www.linkedin.com";
$header[] = "User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0";
$header[] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
$header[] = "Accept-Language: en-US,en;q=0.5";
$header[] = "Accept-Encoding: gzip, deflate, br";
$header[] = "Connection: keep-alive";
$header[] = "Upgrade-Insecure-Requests: 1";
curl_setopt($ch,CURLOPT_ENCODING , "gzip");
curl_setopt($ch, CURLOPT_HTTPHEADER , $header);
$my_var = curl_exec($ch);
echo $my_var;
LinkedIn is not supporting the default encoding 'identity' , so if you set the header
'Accept-Encoding': 'gzip, deflate'
you should get the response , but you would have to decompress it.
I ran into this while doing local web development and using the LinkedIn badge feature (profile.js). I was only getting the 999 Request denied in Chrome, so I just cleared my browser cache and localStorage and it started to work again.
UPDATE - Clearing cache was just a coincidence and the issue came back. LinkedIn is having issues with their badge functionality.
I submitted a help thread to their forums.
https://www.linkedin.com/help/linkedin/forum/question/714971

PHP How cURL/FOpen Stop at Certain Characters

Is there a way for PHP CURL functions to get the contents of the website, but stopped on the characters that we just ask. I think this sort of buffer.
so the script did not call the overall page
So schemes like this:
: curl execution
<html>
->
->
->
-> Title Detected
: curl close
->
->
->
->
</ html>
Please this is not a DOM problem. But how to curl stops when it finds that we ask.
this is my code :
function curl_download($Url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $Url);
curl_setopt($ch, CURLOPT_REFERER, $Url);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46 Safari/536.5");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$output = curl_exec($ch);
curl_close($ch);
return $output;
}
If cURL can't handle this problem, how about fopen? and do you have an example?
Thanks before.Also please give me your example code for me,, thanks
Here is a very simple example using fsockopen(). Extend it to fit your needs.
$host = 'www.site.com';
$port = 80;
$sock = fsockopen($host, $port, $errno, $errstr, 30);
if (!$sock) {
die("Failed to connect. $errno: $errstr");
}
// write http request to socket:
$request = "GET /file.html HTTP/1.0\r\n"
."Host: $host\r\n"
."User-Agent: some-user-agent\r\n"
."Connection: close\r\n"
."\r\n";
fwrite($sock, $request);
$buffer = ''; // buffer for storing response
while (!feof($sock)) {
$buffer .= fgets($sock, 1024); // read 1024 bytes from socket, append to buffer
if (strpos($buffer, '</title>') !== false)) { // title was found
fclose($sock);
break;
}
}
So we connect to the HTTP server on the remote host, issue a simple HTTP/1.0 request, and read the response 1024 bytes at a time until the closing title tag is detected. Once it is found, the connection is closed.
Note, even though we didn't read the entire response from the socket, the underlying system (PHP and the OS socket layer) may have read more (or possibly all depending on size) of the response. In either case, you did prevent PHP from reading most of the response. If the pages are very big, closing the socket early will likely prevent a bulk of the data from actually ever being received.
Hope that helps.
I do not think you can parse the DOM with CURL
I advise you to use the string function like strstr, strtok...
http://www.php.net/manual/en/ref.strings.php

Anyway to 'spy' on the activity of curl_init() & curl_exec()?

I have some PHP code that was at one point WORKING FINE. It makes a call out to an external API, the API has NOT CHANGED AT ALL. The PHP code has also NOT CHANGED AT ALL. But suddenly, I am getting no results back for this function:
if (!function_exists(setFieldsAndCallURL))
{
function setFieldsAndCallURL($url,$fields)
{
//url-ify the data for the POST
$fields_string='';
foreach($fields as $key=>$value)
{ $fields_string .= $key.'='.$value.'&'; }
rtrim($fields_string,'&');
//open connection
$ch = curl_init();
//set the url, number of POST vars, POST data
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_POST,count($fields));
curl_setopt($ch,CURLOPT_POSTFIELDS,$fields_string);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
//execute the jump
$result = '';
$result = curl_exec($ch);
//close connection
curl_close($ch);
return $result;
}
}
Before, it would return a text GUID when called:
$userURL ='https://api.nottherealendpointurl.net/public/user/authenticate';
$userFields = array(
'username'=>$username,
'lastName'=>$lastname,
'firstName'=>$firstname,
'email'=>$email,
'token'=>urlencode($adminKey),
);
//Login this particular user
$userKey = setFieldsAndCallURL($userURL,$userFields);
But suddenly it has started returning "" (empty string) and I have no idea why.
Is there any way to get more info and spy on the inner workings of this thing? See the call it is making using HTTP header logging software? Or anything?
NOTE: I have already tested the POST manually to the API and it is working as expected, I am still getting back the proper GUID. For some reason doing it through this curl thing just suddenly quit doing it properly. Nobody has any idea what could be different now.
Propably the ip of the server is blocked by now, where your local ip is not?
You might want to add
$headerFile = fopen(filepath_to_header_file);
$errorFile = fopen(filepath_to_error_file);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_WRITEHEADER, $headerFile );
curl_setopt($ch, CURLOPT_STDERR, $errorFile );
to get the header of the response and the errors into files and look their content up.
edit:
To verify if the ip of the server is blocked you could try something like this
$host = "ssl://api.nottherealendpointurl.net/";
$port = 443;
$url = "/public/user/authenticate";
$timeout = 30;
$errno = "";
$errstr= "";
$fp = fsockopen($host, $port, $errno, $errstr, $timeout);
if($fp)
{
$request = "GET ".$url." HTTP/1.1\r\n";
$request.= "Host: ".$host."\r\n";
$request.= "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de-DE; rv:1.7.12) Gecko/20050919 Firefox/1.0.7\r\n";
$request.= "Connection: Close\r\n\r\n";
fwrite($fp, $request);
while (!feof($fp))
{
$data .= fgets($fp, 128);
}
fclose($fp);
echo $data;
}
else
{
echo "ERROR: ".$errstr;
}
where $data contains the response from the remote server-
Depending on what platform you're on you can look at the raw packets, for linux command line only, that'd be tcpdump for widows/others you can use wireshark.
tcpdump -i eth1 tcp port 80
or
http://www.wireshark.org/download.html

PHP How To Send Raw HTTP Packet

I want to send a raw http packet to a webserver and recieve its response but i cant find out a way to do it. im inexperianced with sockets and every link i find uses sockets to send udp packets. any help would be great.
Take a look at this simple example from the fsockopen manual page:
<?php
$fp = fsockopen("www.example.com", 80, $errno, $errstr, 30);
if (!$fp) {
echo "$errstr ($errno)<br />\n";
} else {
$out = "GET / HTTP/1.1\r\n";
$out .= "Host: www.example.com\r\n";
$out .= "Connection: Close\r\n\r\n";
fwrite($fp, $out);
while (!feof($fp)) {
echo fgets($fp, 128);
}
fclose($fp);
}
?>
The connection to the server is established with fsockpen. $out holds the HTTP request that’s then send with frwite. The HTTP response is then read with fgets.
If all you want to do is perform a GET request and receive the body of the response, most of the file functions support using urls:
<?php
$html = file_get_contents('http://google.com');
?>
<?php
$fh = fopen('http://google.com', 'r');
while (!feof($fh)) {
$html .= fread($fh);
}
fclose($fh);
?>
For more than simple GETs, use curl (you have to compile it into php). With curl you can do POST and HEAD requests, as well as set various headers.
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://google.com');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($ch);
?>
cURL is easier than implementing client side HTTP. All you have to do is set a few options and cURL handles the rest.
$curl = curl_init($URL);
curl_setopt_array($curl,
array(
CURLOPT_USERAGENT => 'Mozilla/5.0 (PLAYSTATION 3; 2.00)',
CURLOPT_HTTPAUTH => CURLAUTH_ANY,
CURLOPT_USERPWD => 'User:Password',
CURLOPT_RETURNTRANSFER => True,
CURLOPT_FOLLOWLOCATION => True
// set CURLOPT_HEADER to True if you want headers in the result.
)
);
$result = curl_exec($curl);
If you need to set a header that cURL doesn't support, use the CURLOPT_HTTPHEADER option, passing an array of additional headers. Set CURLOPT_HEADERFUNCTION to a callback if you need to parse headers. Read the docs for curl_setopt for more options.

Categories