Using fopen, fwrite multiple times in a foreach loop - php

I want to save files from an external server into a folder on my server using fopen, fwrite.
First the page from the external site is loaded, and scanned for any image links. Then that list is sent from an to the fwrite function. The files are created, but they aren't the valid jpg files, viewing them in the browser it seems like their path on my server is written to them.
Here is the code:
//read the file
$data = file_get_contents("http://foo.html");
//scan content for jpg links
preg_match_all('/src=("[^"]*.jpg)/i', $data, $result);
//save img function
function save_image($inPath,$outPath)
{
$in= fopen($inPath, "rb");
$out= fopen($outPath, "wb");
while ($chunk = fread($in,8192))
{
fwrite($out, $chunk, 8192);
}
fclose($in);
fclose($out);
}
//output each img link from array
foreach ($result[1] as $imgurl) {
echo "$imgurl<br />\n";
$imgn = (basename ($imgurl));
echo "$imgn<br />\n";
save_image($imgurl, $imgn);
}
The save_image function works if I write out a list:
save_image('http://foo.html', foo1.jpg);
save_image('http://foo.html', foo1.jpg);
I was hoping that I'd be able to just loop the list from the matches in the array.
Thanks for looking.

There are two problems with your script. Firstly the quote mark is being included in the external image URL. To fix this your regex should be:
/src="([^"]*.jpg)/i
Secondly, the image URLs are probably not absolute (don't include http:// and the file path). Put this at the start of your foreach to fix that:
$url = 'http://foo.html';
# If the image is absolute.
if(substr($imgurl, 0, 7) == 'http://' || substr($imgurl, 0, 8) == 'https://')
{
$url = '';
}
# If the image URL starts with /, it goes from the website's root.
elseif(substr($imgurl, 0, 1) == '/')
{
# Repeat until only http:// and the domain remain.
while(substr_count($url, '/') != 2)
{
$url = dirname($url);
}
}
# If only http:// and a domain without a trailing slash.
elseif(substr_count($imgurl, '/') == 2)
{
$url .= '/';
}
# If the web page has an extension, find the directory name.
elseif(strrpos($url, '.') > strrpos($url, '/'))
{
$url = dirname($url);
}
$imgurl = $url. $imgurl;

fopen isn't guaranteed to work. You should be checking the return values of anything they may return something different on error...
fopen() - Returns a file pointer resource on success, or FALSE on error.
In fact all the file functions return false on error.
To figure out where it is failing I would recommend using a debugger, or printing out some information in the save_image function. i.e. What the $inPath and $outPath are, so you can validate they are being passed what you would expect.

The main issue I see is that the regex may not capture the full http:// path. Most sites leave this off and use relative paths. You should code in a check for that and add it in if that is not present.

Your match includes the src bit, so try this instead:
preg_match_all('/(?<=src=")[^"]*.jpg/i', $data, $result);
And then I think this should work:
unset($result[0]);
//output each img link from array
foreach ($result as $imgurl) {
echo "$imgurl<br />\n";
$imgn = (basename ($imgurl));
echo "$imgn<br />\n";
save_image($imgurl, $imgn);
}

Related

PHP save the file using the link found by the block ID on the page

On the page https://data.mos.ru/opendata/61241/ the first url with parameter "export/get?id=" contains the last actual link to download the open data csv file //op.mos.ru/EHDWSREST/catalog/export/get?id=989116 .
The problem is that the digital ending of the url after each update is different and is not known in advance.
I have a script that works and allows me to save a file at a pre-known file url (but it only saves the old version of the file, not the current one):
<?php
function downloadJs($file_url, $save_to)
{
$content = file_get_contents($file_url);
file_put_contents($save_to, $content);
}
downloadJs('https://op.mos.ru/EHDWSREST/catalog/export/get?id=989116', realpath("./img/feeds") . '/61241.zip');
$zip = new ZipArchive;$zip->open('./img/feeds/61241.zip');$zip->extractTo('./img/feeds/61241');$zip->close();
$directory = './img/feeds/61241/'; if ($handle = opendir($directory)) { while (false !== ($fileName = readdir($handle))) { $dd = explode($fileName); $newfile = '61241.csv'; rename($directory . $fileName, $directory.$newfile); } closedir($handle); }
echo "Ok!";
?>
I need to change this PHP script so that on the page https://data.mos.ru/opendata/61241/ first determined the first link to the download file by the parameter "export/get?id=", where the link is located.
I'm not sure if you understand what you mean.
we have:
<a target="_blank" href="//op.mos.ru/EHDWSREST/catalog/export/get?id=989116" onclick="yaCounter29850344.reachGoal('download_csv')...
Perhaps we will use a little regex to get that id.
Let's say you already have its html with file_get_contents:
preg_match('#get\?id=(\d+)".* onclick="[^"]+csv[^"]+"#', $html, $matches);
echo $matches[1]; // 989116

open file on client stored on server

I want to open a server stored html report file on a client machine.
I want to bring back a list of all the saved reports in that folder (scandir).
This way the user can click on any of the crated reports to open them.
So id you click on a report to open it, you will need the location where the report can be opend from
This is my dilemma. Im not sure how to get a decent ip, port and folder location that the client can understand
Here bellow is what Ive been experimenting with.
Using this wont work obviously:
$path = $_SERVER['DOCUMENT_ROOT']."/reports/saved_reports/";
So I though I might try this instead.
$host= gethostname();
$ip = gethostbyname($host);
$ip = $ip.':'.$_SERVER['SERVER_PORT'];
$path = $ip."/reports/saved_reports/";
$files = scandir($path);
after the above code I loop through each file and generate a array with the name, date created and path. This is sent back to generate a list of reports in a table that the user can interact with. ( open, delete, edit)
But this fails aswell.
So im officially clueless on how to approach this.
PS. Im adding react.js as a tag, because that is my front-end and might be useful to know.
Your question may be partially answered here: https://stackoverflow.com/a/11970479/2781096
Get the file names from the specified path and hit curl or get_text() function again to save the files.
function get_text($filename) {
$fp_load = fopen("$filename", "rb");
if ( $fp_load ) {
while ( !feof($fp_load) ) {
$content .= fgets($fp_load, 8192);
}
fclose($fp_load);
return $content;
}
}
$matches = array();
// This will give you names of all the files available on the specified path.
preg_match_all("/(a href\=\")([^\?\"]*)(\")/i", get_text($ip."/reports/saved_reports/"), $matches);
foreach($matches[2] as $match) {
echo $match . '<br>';
// Again hit a cURL to download each of the reports.
}
Get list of reports:
<?php
$path = $_SERVER['DOCUMENT_ROOT']."/reports/saved_reports/";
$files = scandir($path);
foreach($files as $file){
if($file !== '.' && $file != '..'){
echo "<a href='show-report.php?name=".$file. "'>$file</a><br/>";
}
}
?>
and write second php file for showing html reports, which receives file name as GET param and echoes content of given html report.
show-report.php
<?php
$path = $_SERVER['DOCUMENT_ROOT']."/reports/saved_reports/";
if(isset($_GET['name'])){
$name = $_GET['name'];
echo file_get_contents($path.$name);
}

PHP 5.3.x - How do I turn the server path into its domain name, and a clickable URL?

I'm a newbie...sorry...I'll admit that I've cobbled this script together from several sources, but I'm trying to learn. :-) Thanks for any help offered!!
$directory = new \RecursiveDirectoryIterator(__DIR__, \FilesystemIterator::FOLLOW_SYMLINKS);
$filter = new \RecursiveCallbackFilterIterator($directory, function ($current, $key, $iterator) {
if ($current->getFilename() === '.') {
return FALSE;
}
if ($current->isDir()) {
return $current->getFilename() !== 'css';
}
else {
// Only consume files of interest.
return strpos($current->getFilename(), 'story.html') === 0;
}
});
$iterator = new \RecursiveIteratorIterator($filter);
$files = array();
foreach ($iterator as $info) {
$files[] = $info->getPathname();
}
?>
Then down in my HTML is where I run into problems, in the 2nd echo statement...
<?php
echo '<ul>';
foreach ($files as $item){
echo '<li>http://<domain.com/directory/subdirectory/story.html></li>';
echo '</ul>';
};
?>
The purpose of my script is to "crawl" a directory looking for a specific file name in sub-directories. Then, when it finds this file, to create a human-readable, clickable URL from the server path. Up to now, my HTML gets one of these two server paths as my list item:
http://thedomain.com/var/www/vhosts/thedomain.com/httpdocs/directory/subdirectory/story.html
or
file:///C:/Bitnami/wampstack-5.5.30-0/apache2/htdocs/directory/subdirectory/story.html
...depending on where I'm running my .php page.
I feel like I need to "strip away" part of these paths... to get down to /subdirectory/story.html ... If I could do that, then I think I can add the rest into my echo statements. Everything I've found for stripping strings has been from the trailing end of the path, not the leading end. (dirname($item)) takes away the filename, and (basename($item)) takes away the subdirectory and the filename ... the bits I want!!
Try this function
function strip($url){
$info = parse_url($url);
$slash = (explode('/',$info['path']));
$sub = $slash[count($slash)-2];
$file = basename($url)==$sub ? "" : basename($url);
return "/".$sub."/".$file;
}
calling it by
echo strip('file:///C:/Bitnami/wampstack-5.5.30-0/apache2/htdocs/directory/subdirectory/story.html');
will result in
/subdirectory/story.html

Why is PHP's ftp_nlist not showing directories, only files?

I only see files, not directories when I call ftp_nlist(). What might I be doing wrong?
The view from FileZilla:
This code runs with no output. If I remove the conditional I get a list of the plain files sans directories.
$contents = ftp_nlist($ftp, '.');
foreach( $contents as $content ) {
// directories don't have .s in them
if( !strstr( $content, '.' ) ) {
echo $content;
}
}
Can supply further information if needed.
ftp_nlist returns only files. Not directories. Manual.
EDIT :
function ListOfFolder($folder_listarry,$conn_id){
for ($i=0; $i<sizeof($folder_listarry); $i++) {
echo $folder_listarry[$i]."<br>";
if (is_dir($folder_listarry[$i]) === false)
{
continue;
}
$contents = ftp_nlist($conn_id, $folder_listarry[$i]);
ListOfFolder($contents,$conn_id);
}
}
I think you can use PHPs usual file/dir functions as opendir() and related also, they support the FTP file wrapper.

Getting Absolute Path of External Web Page Images

I am working on bookmarklet and I am fetching all the photos of any external page using HTML DOM parser(As suggested earlier by SO answer). I am fetching the photos correctly and displaying that in my bookmarklet pop up. But I am having problem with the relative path of photos.
for example the photo source on external page say http://www.example.com/dir/index.php
photo Source 1 : img source='hostname/photos/photo.jpg' - Getting photo as it is absolute
photo Source 2 : img source='/photos/photo.jpg' - not getting as it is not absolute.
I worked through the current url I mean using dirname or pathinfo for getting directory by current url. but causes problem between host/dir/ (gives host as parent directory ) and host/dir/index.php (host/dir as parent directory which is correct)
Please help How can I get these relative photos ??
FIXED (added support for query-string only image paths)
function make_absolute_path ($baseUrl, $relativePath) {
// Parse URLs, return FALSE on failure
if ((!$baseParts = parse_url($baseUrl)) || (!$pathParts = parse_url($relativePath))) {
return FALSE;
}
// Work-around for pre- 5.4.7 bug in parse_url() for relative protocols
if (empty($baseParts['host']) && !empty($baseParts['path']) && substr($baseParts['path'], 0, 2) === '//') {
$parts = explode('/', ltrim($baseParts['path'], '/'));
$baseParts['host'] = array_shift($parts);
$baseParts['path'] = '/'.implode('/', $parts);
}
if (empty($pathParts['host']) && !empty($pathParts['path']) && substr($pathParts['path'], 0, 2) === '//') {
$parts = explode('/', ltrim($pathParts['path'], '/'));
$pathParts['host'] = array_shift($parts);
$pathParts['path'] = '/'.implode('/', $parts);
}
// Relative path has a host component, just return it
if (!empty($pathParts['host'])) {
return $relativePath;
}
// Normalise base URL (fill in missing info)
// If base URL doesn't have a host component return error
if (empty($baseParts['host'])) {
return FALSE;
}
if (empty($baseParts['path'])) {
$baseParts['path'] = '/';
}
if (empty($baseParts['scheme'])) {
$baseParts['scheme'] = 'http';
}
// Start constructing return value
$result = $baseParts['scheme'].'://';
// Add username/password if any
if (!empty($baseParts['user'])) {
$result .= $baseParts['user'];
if (!empty($baseParts['pass'])) {
$result .= ":{$baseParts['pass']}";
}
$result .= '#';
}
// Add host/port
$result .= !empty($baseParts['port']) ? "{$baseParts['host']}:{$baseParts['port']}" : $baseParts['host'];
// Inspect relative path path
if ($relativePath[0] === '/') {
// Leading / means from root
$result .= $relativePath;
} else if ($relativePath[0] === '?') {
// Leading ? means query the existing URL
$result .= $baseParts['path'].$relativePath;
} else {
// Get the current working directory
$resultPath = rtrim(substr($baseParts['path'], -1) === '/' ? trim($baseParts['path']) : str_replace('\\', '/', dirname(trim($baseParts['path']))), '/');
// Split the image path into components and loop them
foreach (explode('/', $relativePath) as $pathComponent) {
switch ($pathComponent) {
case '': case '.':
// a single dot means "this directory" and can be skipped
// an empty space is a mistake on somebodies part, and can also be skipped
break;
case '..':
// a double dot means "up a directory"
$resultPath = rtrim(str_replace('\\', '/', dirname($resultPath)), '/');
break;
default:
// anything else can be added to the path
$resultPath .= "/$pathComponent";
break;
}
}
// Add path to result
$result .= $resultPath;
}
return $result;
}
Tests:
echo make_absolute_path('http://www.example.com/dir/index.php','/photos/photo.jpg')."\n";
// Outputs: http://www.example.com/photos/photo.jpg
echo make_absolute_path('http://www.example.com/dir/index.php','photos/photo.jpg')."\n";
// Outputs: http://www.example.com/dir/photos/photo.jpg
echo make_absolute_path('http://www.example.com/dir/index.php','./photos/photo.jpg')."\n";
// Outputs: http://www.example.com/dir/photos/photo.jpg
echo make_absolute_path('http://www.example.com/dir/index.php','../photos/photo.jpg')."\n";
// Outputs: http://www.example.com/photos/photo.jpg
echo make_absolute_path('http://www.example.com/dir/index.php','http://www.yyy.com/photos/photo.jpg')."\n";
// Outputs: http://www.yyy.com/photos/photo.jpg
echo make_absolute_path('http://www.example.com/dir/index.php','?query=something')."\n";
// Outputs: http://www.example.com/dir/index.php?query=something
I think that should deal with just about everything your likely to encounter correctly, and should equate to roughly the logic used by a browser. Also should correct any oddities you might get on Windows with stray forward slashes from using dirname().
First argument is the full URL of the page where you found the <img> (or <a> or whatever) and second argument is the contents of the src/href etc attribute.
If anyone finds something that doesn't work (cos I know you'll all be trying to break it :-D), let me know and I'll try and fix it.
'/' should be the base path. Check the first character returned from your dom parser, and if it is a '/' then just prefix it with the domain name.

Categories