Response any web page in the internet from a PHP file - php

How can I create a simple PHP file, which will retrieve the HTML and the Headers of any web page in the internet, change images/resources url to their full url (for example: image.gif to http://www.google.com/image.gif), and then response it?

Okay first of all to get the headers use the PHP get_headers function.
<?php
$url = "http://www.example.com/";
$headers = get_headers($url, true);
?>
Then read the content of the page into a variable.
<?php
$handle = fopen($url, r);
$content = '';
while(! feof($handle)) {
$text .= fread($handle, 8192);
}
fclose($handle);
?>
You then need to run through the content looking for resources and pre-pending the url to get the absolute path to the resource if it isn't already an absolute path. The following regex example will work on src attributes (e.g. images and javascript) and should give you a starting point to look at other resources such as CSS which uses href="". This regex won't match if a : is in the source a good indicator that it contains http:// and is therefore an absolute path. PLEASE NOTE this is by no means perfect and won't account for all sorts of weird and wonderful resource locations but it's a good start.
<?php
$pattern = '#src="([0-9A-Za-z-_/\.])+"#';
preg_match_all($pattern, $text, $matches);
foreach($matches[0] as $match) {
$src = str_replace('src="', '', $match);
$text = str_replace($match, 'src="' . $url . $src, $text);
}
print($text);
?>

<?
$file = "http://www.somesite/somepage";
$handle = fopen($file, "rb");
$text = '';
while (!feof($handle)) {
$text .= fread($handle, 8192);
}
fclose($handle);
print($text);
?>

I think what you're looking for is a PHP Proxy script. There are several on the internet - this is one I created (although don't have time to fix bugs at the moment).
I would recommend using one which is already created over one which you've written yourself, as it's not a trivial thing to do (there are better scripts than mine available as well).

Related

fwrite if string doesn't exist in file

I am currently working an auto-content-generator script's sitemap. I got to know that google accept sitemap in simple text file that contains one URL per line.
so I created a file named 1.txt and wrote a script to add current page URL to 1.txt when a user visits.
test.php is:
$file = 'assets/sitemap/1.txt';
$url = "http://".$_SERVER[HTTP_HOST].$_SERVER[REQUEST_URI]."\n";
$file = fopen($file, 'a');
fwrite($file, $url);
fclose( $file );
This script writes the page URLto 1.txt every time someone hits the page. But the problem is, it creates too much duplicate links. So I want to add a filter to not add a string (URL in this case) if it already exists.
After surfing a while, I got a solution here (second snippet) that is resource friendly: PHP check if file contains a string
I made the following modification but it is not working (not adding anything at all):
$file = 'assets/sitemap/1.txt';
$url = "http://".$_SERVER[HTTP_HOST].$_SERVER[REQUEST_URI]."\n";
if(exec('grep '.escapeshellarg($url).' assets/sitemap/1.txt')) {}
else{
$file = fopen($file, 'a');
fwrite($file, $url);
fclose( $file );
}
This is hopefully easier to understand:
$file = 'assets/sitemap/1.txt';
$url = "http://".$_SERVER[HTTP_HOST].$_SERVER[REQUEST_URI]."\n";
$text = file_get_contents($file);
if(strpos($text, $url) === false) {
file_put_contents($file, $url, FILE_APPEND);
}
Read the file contents into a string $text using file_get_contents()
Check if $url is in the string $text using strpos()
If $url is not in the string $text, append the $url to the file using file_put_contents()
To count the total lines, you can start using file() to load the file lines into an array. Then check if the $url is in the array using in_array():
$lines = file($file);
$count = count($lines); // count the lines
if(!in_array($url, $text)) {
file_put_contents($file, $url, FILE_APPEND);
$count++; // if added, add 1 to count
}

Batch download URLs in PHP?

So, I have a PHP script that is supposed to download images that the user inputs. However, if the user uploads a TXT file and it contains direct links to images, it should download the images from all the URLs in the file. My script seems to be working, although it seems that only the last file is downloaded while the others are stored as files containing no data.
Here's the portion of my script where it parses the TXT
$contents = file($file_tmp);
$parts = new SplFileObject($file_tmp);
foreach($parts as $line) {
$url = $line;
$dir = "{$save_loc}".basename($url);
$fp = fopen ($destination, 'w+');
$raw = file_get_contents($url);
file_put_contents($dir, $raw);
}
How do I make it download every URL from the TXT file?
When you iterate over an SplFileObject, you get the whole line, including whitespace. Your URL will thus be something like
http://example.com/_
(php seems to mangle the newline to an underscore) and thus you'll get an error for many URLs (some URLs will still work fine, since they contain the important information prior. For instance, Batch download URLs in PHP? works, but https://stackoverflow.com/_ does not). If an error occurs, file_get_contents will return false, and file_put_contents will interpret that like an empty string.
Also, the line $fp = fopen ($destination, 'w+'); is really strange. For one, since $destination is not defined, it would error anyways. Even if $destination is defined, you'll end up with lots of file handles and overwrite that poor file multiple times. You can just remove it.
To summarize, your code should look like
<?php
$file_tmp = "urls.txt";
$save_loc = "sav/";
$parts = new SplFileObject($file_tmp);
foreach($parts as $line) {
$url = trim($line);
if (!$url) {
continue;
}
$dir = "{$save_loc}".basename($url);
$raw = file_get_contents($url);
if ($raw === false) {
echo 'failed to donwload ' . $url . "\n";
continue;
}
file_put_contents($dir, $raw);
}
It looks like line
$parts = new SplFileObject($file_tmp);
isn't necessary as well as
$fp = fopen ($destination, 'w+');
file() function reads entire file into array. You just have call trim() on each array element to remove new line from characters. Following code should work properly:
<?php
$save_loc = './';
$urls = file('input.txt');
foreach($urls as $url) {
$url = trim($url);
$destination = $save_loc . basename($url);
$content = file_get_contents($url);
if ($content) {
file_put_contents($destination, $content);
}
}

How do I read a file from a destination to variable?

I am trying to read a file's content, while doing something like this
$con="HDdeltin";
$fp = fopen($_SERVER['DOCUMENT_ROOT']. "/HDdeltin/Users/$con", "r");
echo $fp;
It doesn't return anything. What am I doing wrong?
Fast solution :
file_get_contents($_SERVER['DOCUMENT_ROOT'] . "/HDdeltin/Users/HDdeltin");
Use file_get_contents(), and you can prevent bad url with realpath().
$con = "HDdeltin";
$path = realpath($_SERVER['DOCUMENT_ROOT'] . "/HDdeltin/Users/$con");
echo file_get_contents($path);
To get current root, you can also use :
getcwd() function.
dirname(__FILE__), you can get more here.
See more :
faster fopen or file_get_contents?
you can read by using this function
echo file_get_contents($_SERVER['DOCUMENT_ROOT']. "/HDdeltin/Users/$con");
You will likely need to use file_get_contents. http://php.net/manual/en/function.file-get-contents.php
Here is the example they provide:
// Read 14 characters starting from the 21st character
$section = file_get_contents('./people.txt', NULL, NULL, 20, 14);
var_dump($section);
So you will likely need to use
$fp = file_get_contents($_SERVER['DOCUMENT_ROOT']. "/HDdeltin/Users/$con", true);
echo $fp;
There are some differences between PHP versions it seems
<?php
// <= PHP 5
$file = file_get_contents('./people.txt', true);
// > PHP 5
$file = file_get_contents('./people.txt', FILE_USE_INCLUDE_PATH);
?>
So be sure to check out the first link provided.
fopen return a resource, not the text
Use file_get_contents($_SERVER['DOCUMENT_ROOT']. "/HDdeltin/Users/$con") instead

Save website sourecode to file via php

Hi I wan to save the sourecode of http://stats.pingdom.com/w984f0uw0rey to some directory in my website
<?php
if(!copy("http://stats.pingdom.com/w984f0uw0rey", "stats.html"))
{
echo("failed to copy file");
}
;
?>
but this does not work either for me:
<?php
$homepage = file_get_contents('http://stats.pingdom.com/w984f0uw0rey');
echo $homepage;
?>
But I cannot figure how to do it!
thanks
use
<?
file_put_contents('w984f0uw0rey.html', file_get_contents('http://stats.pingdom.com/w984f0uw0rey'));
?>
be sure that the script has write privileges to the current directory
Use file_get_contents().
The best variant you can do in PHP is to use stream_copy_to_stream:
$url = 'http://www.example.com/file.zip';
$file = "/downloads/stats.html";
$src = fopen($url, 'r');
$dest = fopen($file, 'w');
echo stream_copy_to_stream($src, $dest) . " bytes copied.\n";
If you need to add HTTP options like headers, use context options with the fopen call. See as well this similar answer which shows how. It's likely you need to set a user-agent and things so that the other website's server believes you're a browser.

Writing data to file adds ^M at end of line

Using PHP i'm writing content to a .htaccess file using fwrite, this all works correctly but when i view the .htaccess in Vim afterwards it displays ^M at the end of each line that has been added. This doesn't seem to cause any issues but i'm unsure quite whats happening to cause this and whether it can be prevented?
this is the PHP:
$replaceWith = "#SO redirect_301\n".trim($_POST['redirect_301'])."\n#EO redirect_301";
$filename = SITE_ROOT.'/public_html/.htaccess';
$handle = fopen($filename,'r');
$contents = fread($handle, filesize($filename));
fclose($handle);
if (preg_match('/#SO redirect_301(.*?)#EO redirect_301/si', $contents, $regs)){
$result = $regs[0];
}
$newcontents = str_replace($result,$replaceWith,$contents);
$filename = SITE_ROOT.'/public_html/.htaccess';
$handle = fopen($filename,'w');
if (fwrite($handle, $newcontents) === FALSE) {
}
fclose($handle);
When i check in Vim afterwards i will see something like this:
#SO redirect_301
Redirect 301 /from1 http://www.domain.com/to1^M
Redirect 301 /from2 http://www.domain.com/to2^M
Redirect 301 /from3 http://www.domain.com/to3
#EO redirect_301
The server is running CentOS and i'm working locally on a Mac
Your newlines are incoming as \r\n, not as \n.
Before writing to the file, you should replace the invalid input:
$input = trim($_POST['redirect_301']);
$input = preg_replace('/\r\n/', "\n", $input); // DOS style newlines
$input = preg_replace('/\r/', "\n", $input); // Mac newlines for nostalgia

Categories