How do I write a simple PHP transparent proxy?

How do I write a simple PHP transparent proxy? - php

I need to make a proxy script that can access a page hidden behind a login screen. I do not need the proxy to "simulate" logging in, instead the login page HTML should be displayed to the user normally, and all the cookies and HTTP GET/POST data to flow through the proxy to the server, so the login should be authentic.
I don't want the login/password, I only need access to the HTML source code of the pages generated after logging in.
Does anybody here know how this can be accomplished? Is it easy?
If not, where do I begin?* (I'm currently using PHP)*

Have your PHP script request the URL you want, and rewrite all links and form actions to point back to your php script. When receiving requests to the script that have a URL parameter, forward that to the remote server and repeat.
You won't be able to catch all JavaScript requests, (unless you implemented a JavaScript portion of your "proxy")
Eg: User types http://example.com/login.php into your proxy form.
send the user to http://yoursite.com/proxy.php?url=http://example.com/login.php
make sure to urlencode the parameter "http://example.com/login.php"
In http://yoursite.com/proxy.php, you make an HTTP request to http://example.com/login.php
$url = $_REQUEST['url'];
// make sure we have a valid URL and not file path
if (!preg_match("`https?\://`i", $url)) {
die('Not a URL');
}
// make the HTTP request to the requested URL
$content = file_get_contents($url);
// parse all links and forms actions and redirect back to this script
$content = preg_replace("/some-smart-regex-here/i", "$1 or $2 smart replaces", $content);
echo $content;
Note that /some-smart-regex-here/i is actually a regex expression you should write to parse links, and such.
The example just proxies the HTTP Body, you may want to proxy the HTTP Headers. You can use fsockopen() or PHP stream functions in PHP5+ (stream_socket_client() etc.)

You could check out http://code.google.com/p/php-transparent-proxy/ , I made it because I was asking myself that exact same question and I decided to make one. It's under BSD license, so have fun :)

What you are talking about is accessing pages for which you need to authenticate yourself.
Here are a few things that must be laid down:
you can't view those pages without authenticating yourself.
if the website (whose HTML code you want to see) only supports web login as an authentication method, you will need to simulate login by sending a (username,password) via POST/GET, as the case may be
if the website will let you authenticate yourself in other ways (like LDAP, Kerberos etc), then you should do that
The key point is that you cannot gain access without authenticating yourself first.
As for language, it is pretty doable in PHP. And as the tags on the question suggest, you are using the right tools to do that job already.
One thing I would like to know is, why are you calling it a "proxy"? do you want to serve the content to other users?
EDIT: [update after comment]
In that case, use phproxy. It does what you want, along with a host of other features.

I would recommand using Curl (php library that you might need to activate in your php.ini)
It's used to manipulate remote websites, handling cookies and every http parameters you need.
You'll have to write your proxy based on the web pages you're hitting, but it'll make the job.

Related

Processing cURL and other POST types of requests

I'm afraid that you're all gonna need to reach over and put on your "this-is-a-dumb-question" hat, but I can't find a legitimate answer online.
I can find all sorts of crazy info on sending cURL requests to a site and have them send back a response and processing that response.
My question is I want to BE the site that receives this cURL requests. How do you set up a page to process those types of requests? How does it work?
I ask because in the near future I will need to integrate this into a project I'm working on. Other sites will send me info that I'll need to process and store in a database. Ideally, I'll probably need something to handle HTTP POST requests and obviously I'll need to know what security measures I'll need to take as well. It also would be nice to know how to fire back a response to the person who's making the request. I also will need to know how to configure user authentication....such as with:
curl_setopt($request, CURLOPT_USERPWD, "Username:Password");
I feel silly asking, but I can't figure out the right search keywords to use to look up this kind of info. Even just a link to a relevant site would be appreciated.
Thanks in advance. You all rock.

All that cURL does is act as a user agent. Any user agent can access a php page.
So to be the site that receives cURL requests (or to have such a page) all you really need to do is create a normal php page that would work as if a user were using it. Whoever is using cURL to make the request will have to know what data you are expecting on your end.
If you want to use CURLOPT_USERPWD for authentication, you have to set up apache (or whatever server you are using) to password protect that page. This can't be done in the php script directly. However, the php script can have it's own form of authentication either instead of or in addition to this authentication. There are millions of ways to do this (openssl for example, or just anticipating a static string depending on how secure you want to be).

Getting the domain that calls an PHP file on your server through AJAX

I'm building out an API and have a question about how to track/know which domains use the call.
The API call is built in PHP, and doesn't require any authentication. A user will most likely use the API in an AJAX call on their server.
So for example, my domain that is serving up the API PHP file is called dev.yourmapper.com. Someone on the domain www.metromapper.org builds a page that creates a Google map, and calls my file using Ajax to overlay my data on their map.
Here is that example in action: http://www.metromapper.org/example/apitest.htm
(Click the center map marker to see a popup of all the PHP Server variables available to the yourmapper.com script.)
Note that HTTP_REFERER is likely going to be 'stackoverflow.com' if you click the link (or empty if you cut and paste the link). I would think that the referer would be metromapper.org, since that domain calls the yourmapper.com script after it loads, but apparently not.
Bottom line: what method can I use to determine which domain is calling my yourmapper.com script with Javascript? I can use other languages besides PHP if needed. Thanks.

"I would think that the referer would be metromapper.org, since that domain calls the yourmapper.com script after it loads"
That's incorrect actually. Firstly you should never rely on the HTTP_REFERER because it's a voluntary parameter passed by most (not all) browsers, and it can easily be spoofed. I can send your website requests using CURL that make it look like the referrer was whitehouse.gov if I want to. There's no security measures in place there.
That being said. The browser sets that parameter to the page that referred the user to the currently loaded page. Not script. So the reason you see the result you're seeing is because the user was referred to metromapper.org by a link on stackoverflow.com
Finally, let's get to the juicy part. You're using JS to code things in the browser. That's fine and there's absolutely no problem with that. But you have to remember that JS is open source. So people can (and will) mess with your code to play with your API just because they can. That being said. Your best bet is probably to pass the url of the site along with the request in your JS api. That's the best way to "track" what sites are using your script. You could check server side to make sure that a URL was passed. That would prevent people from modifying your API to remove the bit that sends their URL to your server. It won't, however, prevent them from modifying it to use someone else's url or a random unregistered one as the parameter.
Sure you could build a PHP API that they run on their server. The JS API connects to the PHP API and the PHP API is zend-guard encoded (or some other source protection code system) but then there's still going to be people who decode the file to get back to your source and mess with you. Granted there'd be far less people able to do that, and the average user would just rather use your API as it is. Then you also have the issue of people not being able to run your API on servers that don't have the ability to run encoded PHP files.
In the end you have to determine your level of desired security and authentication, but since your API is running in JavaScript in the client browser, there is very little available beyond obfuscation.
I'd say your best option would be to simply have your JS code snag the URL of the current page and send it with the API request. From there your server can process the URL to get the root domain and any other info you want to store.
If you want to prevent people from "spoofing" requests for other user's website urls, you could implement a PHP API that gets installed on the user's server at a certain place. For example http://www.domain.com/my-app-name.php
All JS API calls should go through that script. When the user downloads your API they should enter their website URL and some other info. Your system generates a "key" and injects it into the script before packaging it for them to download. That key is valid for their domain and used to encode all transmission to/from your API using say blowfish or another 2-way encryption algorithm. This way when your API receives a request from their PHP API file, you're getting the url of the page that request was made from, encoded with a key that only you and the admin of that site have. So the request comes through as something like this: metromapper.org/api?site=[url_encoded_page_address]&req=[encrypted_request]
Your server uses the page url to determine what key should be used to decrypt the data. It then decrypts the data. If the data is corrupted or doesn't decrypt into what you expect, then it's an invalid request and you should just exit returning nothing.
The reason I suggest using a PHP file for encryption as opposed to writing the encryption into JS is because you don't want to burden the client (each site visitor) with the load of encryption/decryption and PHP is going to handle it much faster than JS would since there are libraries made to handle those tasks for you.
At any rate that should get you on the right track to being able to keep track of and validate requests for different sites against your API.

You could generate a hash based on the domain name, and let the users of your API send the domain name and the hash in each request. Now since you're API uses PHP you'll have set the 'Access-Control-Allow-Origin' somewhere in the header. If you do this in PHP you can play around with that a bit. The script below is a simple example of an implementation that doesn't require php on the caller side (domain that uses you're API).
Caller Side (no php required):
<script type="text/javascript">
function callA() {
var xhttp = new XMLHttpRequest();
xhttp.open("GET", "//ajaxdomain.com/call.php?"+
"dom=www.callerdomain.com&"+
"key=41f8201df6cf1322cc192025cc6c5013",
true);
xhttp.onreadystatechange = function() {
if(xhttp.readyState == 4 && xhttp.status == 200) {
handleResponse(xhttp.responseText);
}
}
xhttp.send();
}
</script>
Ajax Server Side (PHP):
<?php
if($_GET['key']==md5($_GET['dom']."Salt")) {
header("Access-Control-Allow-Origin: http://".$_GET['dom']);
}
?>
This way the header would also be placed if the call came from a malicious domain, but rest will bounce because of a Cross Origin Exception, and thus no result will be given.
For the sake code space I used a md5 hash in this example, but you could use more complex hashes if you want. Note that you should (as always) keep the used salt secret.
I put a working example online at the following (sub)domains. The pages are identical.
cors1.serioushare.com - Only works on 'CORS 1' button.
cors2.serioushare.com - Only works on 'CORS 2' button.

PHP get the site that calls your script via file_get_contents

I have a PHP script hosted on my site that outputs a value based on the GET parameters passed.
Other sites call this script from within their own PHP scripts via the PHP function file_get_contents with the url and get params and are served back just the value requested.
I am trying to allow only certain domains access to this script and have been using HTTP_REFERER to check who's calling the script.
if (isset($_SERVER['HTTP_REFERER'])) // check if referrer is set
{
echo $_SERVER['HTTP_REFERER']; // echo referrer
}
else
{
echo 'No referrer set'; // echo failure message
}
I am getting No referrer set when I use file_get_contents but if I use a clicked link from a page to a script with the above code the referrer displays correctly.
Am I using the wrong function (file_get_contents) to call the external script and can someone suggest the correct one or should this work?
Any help much appreciated.
Thanks

Bear in mind that the HTTP "Referer" header is an optional header -- there's no need for a site to send it to you, and it can be easily faked. If you really only want certain people to use your resources, you're better off using some form of authentication.
Typically Referer: is sent by web browsers, but there's no need for it to be -- for example, they won't send it if the referer is a secure site. With a PHP file_get_contents() there isn't technically a referer anyway; you're not being "referred" from anywhere.
Consider instead either:
Locking down by IP address (but bear in mind that multiple domains can share a single IP, and that a domain's IP can change.)
Using some form of authentication (preferably not one that transmits passwords in plain text!)
You should consider how secure you need this service to be, and what threats might attack it when deciding the right security to apply.

You would be much better to restrict based on IP address rather than domain, much more reliable. Just keep an array of allowed IP's and call in_array($_SERVER['REMOTE_ADDR'],$allowedAddresses) to validate it.
Or just require authentication via a cookie or HTTP auth...

You can't do this using HTTP_REFERER.
The HTTP_REFERER it set by the client, and it can be anything the client wants.
You have to use a password / key authentication mechanism instead.

May want to use something along the lines of a stream context to set extra headers.
http://us.php.net/manual/en/function.stream-context-create.php
Additionally, if needed, you could set a 'secret' header to authenticate the requests, rather then the referer.

Return HTML or XML based on request in PHP

There's an existing website written in PHP. Originally only the website existed, but now an Android application is being built that would benefit from re-using some of the PHP logic.
The PHP site was structured such that there are many pages that perform an action, set success/error information in $_SESSION, and then redirect to a visual page without outputting any content themselves. For example, there's action_login.php:
The page accepts a username and password (from GET or POST variables), validates the credentials, sets success/failure messages in $_SESSION, and then redirects to the logged-in homepage on success or back to the login screen on failure. Let's call this behavior the "HTML response".
The Android application will need to call the same page but somehow tell it that it wants an "XML response" instead. When the page detects this, it will output success/error message in an XML format instead of putting them in $_SESSION and won't redirect. That's the idea anyway. This helps prevent duplicate code. I don't want to have action_login.php and action_login.xml.php floating around.
I've read that the Accept Header isn't reliable enough to use (see: Unacceptable Browser HTTP Accept Headers (Yes, You Safari and Internet Explorer)). My fallback solution is to POST xml=1 or use {url}?xml=1 for GET requests. Is there a better way?
No frameworks are being used, this is plain PHP.

That's what the Accept Header is for. Have the Android request the page as application/xml and then check what was requested in your script. You might also be interested in mod_negotiation when using Apache. Or use WURFL to detect the UserAgent and serve as XML when Android.

I'd go with the android app sending a cookie for every request (really I would prefer the Accept header, but with the problems you pointed out with webkit I understand your reluctance to do so). The cookie simplifies the code server-side to not have to check for $_GET['xml'] or $_POST['xml'], and if some android user shares an URL of your application and it had a ?xml=1, the user who opens this in a computer browser would receive XML instead of the normal web output.
I wouldn't rely on $_SESSION for mobile applications because users (or at least I do) on mobile platforms tend to open your app, play 5 minutes, put mobile on pocket and 2 hours later return to your app. Do you want to set a session lifetime so long?

why not set a specific session for the app and then only set the header if the session is set something along the lines of
$_SESSION['app'] = "andriod app";
if ($_SESSION['app'] == "andriod app") {
header..
not really sure how to implement this into an app as I've done really little work with apps but hope this helps your thought process

How can I hide $_SERVER['HTTP_REFERER']

How can I hide $_SERVER['HTTP_REFERER'] when a user browses to another site via a link from my site?

You can't, you have no control over the headers that are sent to another site. Headers are sent from the browser, to the site being navigated. This means you cannot manipulate them in any way (short of a MITM attack).
You could redirect the user to the site via an intermediary proxy, but that proxy will become the new referrer. e.g.
Your Link -> Proxy -> End result

Not only should this generally not be done, but it is not possible, at least in the way you are describing. It is up to the client to decide what to send in the request headers to a different server, not you.
I should also point out that this has nothing to do with PHP. PHP makes this header variable accessible to you via $_SERVER['HTTP_REFERRER'], but the problem you are trying to solve is avoiding the client from sending the referrer URL to the next server.
A few options:
If your site utilizes HTTPS, then it won't be sent.
If you build a redirector script on your site and use the HTTP Refresh header, the browser will typically not send the referrer, and if it did, you would only be sending the URL of your redirector. For example:
http://www.yoursite.com/redir.php?url=http%3A%2F%2Fwww.google.com
<?php
if (isset($_GET['url'])) {
header("Refresh: 0; " . $_GET['url']);
}
?>
Now, you must be careful with this little script. Anyone could then use your site to make a redirect look like it was coming from you. Also, using this method, anyone can inject whatever headers they want to the client. This is just to give you an idea. Finally, using the refresh header for this goes against the grain of the standards and should not be done.
Finally, Google, Facebook, PayPal, etc. all have redirector scripts. They use some sort of encrypted hash on the URL to determine if they generated the redirect or not. If you don't specify that hash and just give the URL, then the user will be prompted before redirecting. Not friendly.
Look, the bottom line is, there isn't really a reason to do what you are doing. If you are trying to hide something in your URL, then you have bigger problems. Security through obscurity is bad, mmkay?

If you're working in a controlled (intranet say) environment you might benefit from fixing browser configs see eg. http://cafe.elharo.com/privacy/privacy-tip-3-block-referer-headers-in-firefox/ but this is far from ideal.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.