There are any open-soure PHP Web Proxy ready to use? - php

I need a PHP Web Proxy that read html, show to the user and rewrite all the links for when the user click in the next link the proxy will handle the request again, just like this code, but with additionaly sould make the rewrite of all the links.
<?php
// Set your return content type
header('Content-type: text/html');
// Website url to open
$daurl = 'http://www.yahoo.com';
// Get that website's content
$handle = fopen($daurl, "r");
// If there is something, read and return
if ($handle) {
while (!feof($handle)) {
$buffer = fgets($handle, 4096);
echo $buffer;
}
fclose($handle);
}
?>
I hope I have explained well. This question is for not reinventing the wheel.
Another additional question. This kind of proxies will deal with contents like Flash?

For an open source solution, check out PHProxy. I've used it in the past and it seemed to work quite well from what I can remember.

It will sort of work, you need to rewrite any relative path to apsolute, and I think cookies won't work in this case. Use cURL for this operations...
function curl($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
return curl_exec($ch);
curl_close ($ch);
}
$url = "http://www.yahoo.com";
echo curl($url);

Related

Preventing hotlinking on Amazon S3 with PHP

In my site i am doing a image protection section to cut the costs of Amazon S3. So as a part of that i have made anti hot-linking links for images using php (to the best of my understanding).
<video src="/media.php?id=711/video.mp4"></video>
Then my media.php file looks like:
if (isset($_SERVER['HTTP_REFERER']) && $_SERVER['HTTP_REFERER'] != 'example.com')
{
header('HTTP/1.1 503 Hot Linking Not Permitted');
header("Content-type: image/jpeg");
readfile("http://example.com/monkey.jpg");
exit;
}
$url = 'https://s3.example.com';
$s3 = "$url/$file";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $s3);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_NOBODY, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$results = explode("\n", trim(curl_exec($ch)));
foreach($results as $line) {
if (strtok($line, ':') == 'Content-Type') {
$parts = explode(":", $line);
$mime = trim($parts[1]);
}
}
header("Content-type: $mime");
readfile($s3);
To make it less obvious, I have set up a rewrite to route /711/video.mp4 into cdn/711/video.mp4. That way, it doesn't look like there is an PHP script.
RewriteRule ^cdn/([0-9a-zA-Z_]+)/([0-9a-zA-Z-\w.-]+)([\w.-]+)?$ media\.php\?id=$1/$2 [QSA,L]
This above system is working fine but the issue is when i load image directly the loading time of the image is 237ms and when the image is loaded through this PHP script the loading time is 1.65s
I have shared the entire code i have, so if there is any chance of improvement in it please guide me in the right direction so i can make changes accordingly.
The reason your script takes longer than querying s3 directly is that you've added a lot of overhead to the image request. Your webserver needs to download the image and then forward it to your users. That is almost definitely your biggest bottleneck.
The first thing I would suggest doing is using the S3 API. This still uses curl() under the hood but has optimizations that should nominally increase performance. This would also allow you to make your s3 bucket "private" which would make obscuring the s3 url unnecessary.
All of that said, the recommended way to prevent hotlinking with AWS is to use cloudfront with referrer checking. How that is done is outlined by AWS here.
If you don't want to refactor your infrastructure, the best way to improve performance is likely to implement a local cache. At its most basic, that would look something like this:
$cacheDir = '/path/to/a/local/directory/';
$cacheFileName = str_replace('/', '_', $file);
if (file_exists($cacheDir . $cacheFileName)){
$mime = mime_content_type($cacheDir . $cacheFileName);
$content = file_get_contents($cacheDir . $cacheFileName);
} else {
$url = 'https://s3.example.com';
$s3 = "$url/$file";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $s3);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_NOBODY, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$results = explode("\n", trim(curl_exec($ch)));
foreach($results as $line) {
if (strtok($line, ':') == 'Content-Type') {
$parts = explode(":", $line);
$mime = trim($parts[1]);
}
}
$content = file_get_contents($s3);
file_put_contents($cacheDir . $cacheFileName, $content);
}
header("Content-type: $mime");
echo $content;
This stores a copy of the file locally so that the server does not need to download it from s3 every time it is requested. That should reduce your overhead somewhat, though it will not do as well as a purely AWS based solution. With this solution you'll also have to add ways of cache-breaking, periodically expiring the cache, etc. Just to reiterate, you shouldn't just copy/paste this into a production environment, it is a start but is more a proof of concept than production ready code.

Getting whole HTML element with PHP

I want to get the whole element <article> which represents 1 listing but it doesn't work. Can someone help me please?
containing the image + title + it's link + description
<?php
$url = 'http://www.polkmugshot.com/';
$content = file_get_contents($url);
$first_step = explode( '<article>' , $content );
$second_step = explode("</article>" , $first_step[3] );
echo $second_step[0];
?>
You should definitely be using curl for this type of requests.
function curl_download($url){
// is cURL installed?
if (!function_exists('curl_init')){
die('cURL is not installed!');
}
$ch = curl_init();
// URL to download
curl_setopt($ch, CURLOPT_URL, $url);
// User agent
curl_setopt($ch, CURLOPT_USERAGENT, "Set your user agent here...");
// Include header in result? (0 = yes, 1 = no)
curl_setopt($ch, CURLOPT_HEADER, 0);
// Should cURL return or print out the data? (true = retu rn, false = print)
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Timeout in seconds
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
// Download the given URL, and return output
$output = curl_exec($ch);
// Close the cURL resource, and free system resources
curl_close($ch);
return $output;
}
for best results for your question. Combine it with HTML Dom Parser
use it like:
// Find all images
foreach($output->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($output->find('a') as $element)
echo $element->href . '<br>';
Good Luck!
I'm not sure I get you right, But I guess you need a PHP DOM Parser. I suggest this one (This is a great PHP library to parser HTML codes)
Also you can get whole HTML code like this:
$url = 'http://www.polkmugshot.com/';
$html = file_get_html($url);
echo $html;
Probably a better way would be to parse the document and run some xpath queries over it afterwards, like so:
$url = 'http://www.polkmugshot.com/';
$xml = simplexml_load_file($url);
$articles = $xml->xpath("//articles");
foreach ($articles as $article) {
// do sth. useful here
}
Read about SimpleXML here.
extract the articles with DOMDocument. working example:
<?php
$url = 'http://www.polkmugshot.com/';
$content = file_get_contents($url);
$domd=#DOMDocument::loadHTML($content);
foreach($domd->getElementsByTagName("article") as $article){
var_dump($domd->saveHTML($article));
}
and as pointed out by #Guns , you'd better use curl, for several reasons:
1: file_get_contents will fail if allow_url_fopen is not set to true in php.ini
2: until php 5.5.0 (somewhere around there), file_get_contents kept reading from the connection until the connection was actually closed, which for many servers can be many seconds after all content is sent, while curl will only read until it reaches content-length HTTP header, which makes for much faster transfers (luckily this was fixed)
3: curl supports gzip and deflate compressed transfers, which again, makes for much faster transfer (when content is compressible, such as html), while file_get_contents will always transfer plain

fopen alternative for FPDF (PHP)

I am using FPDF to extract info from a PNG file. Unfortunately, the server has fopen disabled. Can anyone recommend a good way of getting around this? Any help would be much appreciated. Thanks in advance!
function _parsepng($file)
{
// Extract info from a PNG file
$f = fopen($file,'rb');
if(!$f)
$this->Error('Can\'t open image file: '.$file);
$info = $this->_parsepngstream($f,$file);
fclose($f);
return $info;
}
You can try curl but usually if your hosting company disables one they also disable the other.
I really don't know if it can work :-)
but you may try with fsockopen
http://www.php.net/manual/en/function.fsockopen.php
file:// can be used as protocol
hope this helps
Ended up using a cURL workaround. Thanks everyone for the input!
function _parsepng($file)
{
$ch = curl_init($file);
$fp = fopen('/tmp/myfile.png', 'wb');
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_FILE, $fp);
$image_data = curl_exec($ch);
curl_close($ch);
// Extract info from a PNG file
$f = fopen('/tmp/myfile.png','rb');
if(!$f)
$this->Error('Can\'t open image file: '.$file);
$info = $this->_parsepngstream($f,$file);
fclose($f);
return $info;
}

how to get file from xooplate.com using php

i see http://xooplate.com/templates/download/13693 return a file.
i have using
$ch = curl_init("http://xooplate.com/templates/download/13693");
$fp = fopen("example_homepage.zip", "w");
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
fclose($fp);
function get_include_contents($filename) {
if (is_file($filename)) {
ob_start();
include $filename;
$contents = ob_get_contents();
ob_end_clean();
return $contents;
}
return false;
}
$string = get_include_contents('http://xooplate.com/templates/download/13693');
but not working, i expected one a help, thank for all
Updated Answer :
Hey buddy, i checked that site. they put onClick jQuery method on the download button (widget.js line 27) with the post domain security. I tried to trigger the onClick event but due to domain check security functiion server only allows http://www.stumbleupon.com and http://xooplate.com to post the data using AJAX. So it wouldn't be possible using curl or PHPDOM. Sorry. :) Great Security.
Below Solution is not the answer !
Hi I found a solution using PHP Simple HTML DOM Library. Just Download it and use the following code
<?php
include("simple_html_dom.php");
$html = file_get_html('http://xooplate.com/templates/details/13693-creative-circular-skills-ui-infographic-psd');
foreach($html->find('a[class=btn_green nturl]') as $element)
echo $element->href . '<br>';
?>

Php curl: working script, need a little added

This is my code:
function get_remote_file_to_cache(){
$sites_array = array("http://www.php.net", "http://www.engadget.com", "http://www.google.se", "http://arstechnica.com", "http://wired.com");
$the_site= $sites_array[rand(0, 4)];
$curl = curl_init();
$fp = fopen("rr.txt", "w");
curl_setopt ($curl, CURLOPT_URL, $the_site);
curl_setopt($curl, CURLOPT_FILE, $fp);
curl_exec ($curl);
curl_close ($curl);
}
$cache_file = 'rr.txt';
$cache_life = '15'; //caching time, in seconds
$filemtime = #filemtime($cache_file);
if (!$filemtime or (time() - $filemtime >= $cache_life)){
ob_start();
echo file_get_contents($cache_file);
ob_get_flush();
echo " <br><br><h1>Writing to cache</h1>";
get_remote_file_to_cache();
}else{
echo "<h1>Reading from cache file:</h1><br> ";
ob_start();
echo file_get_contents($cache_file);
ob_get_flush();
}
Everything works as it should and no problems or surprises, and as you can see its pretty simple code but I am new to CURL and would just like to add one check to the code, but dont know how:
Is there anyway to check that the file fetched from the remote site is not a 404 (not found) page or such but is a status code 200 (successful) ?
So basically, only write to cache file if the fill is status code 200.
Thanks!
To get the status code from a cURL handle, use curl_getinfo after curl_exec:
$status = curl_getinfo($curl, CURLINFO_HTTP_CODE);
But the cached file will be overwritten when
$fp = fopen("rr.txt", "w");
is called, regardless of the HTTP code, this means that to update the cache only when status is 200, you need to read the contents into memory, or write to a temporary file. Then finally write to the real file if the status is 200.
It is also a good idea to
touch('rr.txt');
before executing cURL, so that the next request that may come before the current operation finish will not also try to load the page to page too.
Try this after curl_exec
$httpCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);

Categories