The project I am working on requires creating .tar.gz archives and feeding it to an external service. This external service works only with .tar.gz so another type archive is out of question. The server where the code I am working on will execute does not allow access to system calls. So system, exec, backticks etc. are no bueno. Which means I have to rely on pure PHP implementation to create .tar.gz files.
Having done a bit of research, it seems that PharData will be helpful to achieve the result. However I have hit a wall with it and need some guidance.
Consider the following folder layout:
parent folder
- child folder 1
- child folder 2
- file1
- file2
I am using the below code snippet to create the .tar.gz archive which does the trick but there's a minor issue with the end result, it doesn't contain the parent folder, but everything within it.
$pd = new PharData('archive.tar');
$dir = realpath("parent-folder");
$pd->buildFromDirectory($dir);
$pd->compress(Phar::GZ);
unset( $pd );
unlink('archive.tar');
When the archive is created it must contain the exact folder layout mentioned above. Using the above mentioned code snippet, the archive contains everything except the parent folder which is a deal breaker for the external service:
- child folder 1
- child folder 2
- file1
- file2
The description of buildFromDirectory does mention the following so it not containing the parent folder in the archive is understandable:
Construct a tar/zip archive from the files within a directory.
I have also tried using buildFromIterator but the end result with it also the same, i.e the parent folder isn't included in the archive. I was able to get the desired result using addFile but this is painfully slow.
Having done a bit more research I found the following library : https://github.com/alchemy-fr/Zippy . But this requires composer support which isn't available on the server. I'd appreciate if someone could guide me in achieving the end result. I am also open to using some other methods or library so long as its pure PHP implementation and doesn't require any external dependencies. Not sure if it helps but the server where the code will get executed has PHP 5.6
Use the parent of "parent-folder" as the base for Phar::buildFromDirectory() and use its second parameter to limit the results only to "parent-folder", e.g.:
$parent = dirname("parent-folder");
$pd->buildFromDirectory($parent, '#^'.preg_quote("$parent/parent-folder/", "#").'#');
$pd->compress(Phar::GZ);
I ended up having to do this, and as this question is the first result on google for the problem here's the optimal way to do this, without using a regexp (which does not scale well if you want to extract one directory from a directory that contains many others).
function buildFiles($folder, $dir, $retarr = []) {
$i = new DirectoryIterator("$folder/$dir");
foreach ($i as $d) {
if ($d->isDot()) {
continue;
}
if ($d->isDir()) {
$newdir = "$dir/" . basename($d->getPathname());
$retarr = buildFiles($folder, $newdir, $retarr);
} else {
$dest = "$dir/" . $d->getFilename();
$retarr[$dest] = $d->getPathname();
}
}
return $retarr;
}
$out = "/tmp/file.tar";
$sourcedir = "/data/folder";
$subfolder = "folder2";
$p = new PharData($out);
$filemap = buildFiles($sourcedir, $subfolder);
$iterator = new ArrayIterator($filemap);
$p->buildFromIterator($iterator);
$p->compress(\Phar::GZ);
unlink($out); // $out.gz has been created, remove the original .tar
This allows you to pick /data/folder/folder2 from /data/folder, even if /data/folder contains several million OTHER folders. It then creates a tar.gz with the contents all being prepended with the folder name.
Related
I am trying to extract the contents of a folder within a tarball using PHP. I am using the following PHP to download and extract the archive:
<?php
function wget($address,$filename) {
file_put_contents($filename,file_get_contents($address));
}
$newdir = 'test';
echo '<br>Downloading latest gzipped WordPress tarball';
wget('http://wordpress.org/latest.tar.gz', 'latest.tar.gz');
echo '<br>about to Extract from gz';
// decompress from gz
$p = new PharData('latest.tar.gz');
$p->decompress(); // creates files.tar
echo '<br>Extracted from gz';
// unarchive from the tar
$phar = new PharData('latest.tar');
echo '<br>Un-TARd';
$phar->extractTo($newdir);
echo '<br>Complete';
?>
My problem is that this script extracts the tarball into /test/wordpress whereas I need it to extract to /test/. I have read through this documentation on the PHP.net Manual and replaced part of my code to meet one of the examples there. The code I had was:
$phar->extractTo($newdir);
And I changed that to:
$phar->extractTo($newdir, 'wordpress');
But that didn't work. The PHP script processed through to the end but the /test/ directory was empty.
The aim of this is to create a one-click WordPress install on our local dev server.
I know the thread is very old and you probably found solution but maybe I'll save someone else time.
Funcion extractTo expects slash at the end of directory name you want to extract.
So $phar->extractTo($newdir, 'wordpress/'); should work.
I'm been brought in to work on an existing CMS and File Management web application that provides a merchant with a management interface for their online webshops. The management application is developed in PHP.
When the website users are viewing the webshops, the page assets (mainly images in nested folder paths) are referenced directly from the HTML of the webshops and are served directly from a web server which is separate to the CMS system.
But in order to list / search / allow navigation of the files (i.e. the File Management part) the CMS application needs to be able to access the files/folders directory structure.
So we are using Linux NFS mounts to the document file server from the CMS server. This works fairly well if the number of files in any specific merchant's directory tree is not too large (<10000). However, some merchant's have more than 100000 files in a nested directory tree. Walking this size of tree to get just the directory structure can take more than 120 seconds.
Retrieving just the list of files in any one directory is quite fast, but the problem comes when we try to identify which of these "files" are actually directory entries, so we can recurse down the tree.
It seems that the PHP functions to check the file type (either calling "is_dir" on each filepath retrieved with "readdir" or "scandir", or using "glob" with flag GLOB_ONLYDIR) work on each file individually, not in bulk. So there are now 1000s and 1000s of NFS commands being sent. From my research so far, it seems that this is a limitation of NFS, not of PHP.
A stripped down class showing just the function in question:
class clImagesDocuments {
public $dirArr;
function getDirsRecursive( $dir ) {
if ( !is_dir( $dir )) {
return false;
}
if ( !isset( $this->dirArr )) {
$this->dirArr = glob( $dir . "/*", GLOB_ONLYDIR );
} else {
$this->dirArr = array_merge( $this->dirArr, glob( $dir . "/*", GLOB_ONLYDIR ) );
return false;
}
for( $i = 0; $i < sizeof( $this->dirArr ); $i ++) {
$this->getDirsRecursive( $this->dirArr [$i] );
}
for( $i = 0; $i < sizeof( $this->dirArr ); $i ++) {
$indexArr = explode( $dir, $this->dirArr [$i] );
$tempDir[$indexArr[1]] = $this->dirArr [$i];
}
$this->dirArr = $tempDir;
}
}
Executing the same PHP code to retrieve the directory tree etc locally on the file document server is much, much faster (2 or 3 orders of magnitude), presumably because the local filesystem is caching the directory structure. I am forced to think that my problem is due to NFS.
I'm considering writing a simple webapp which will run on the file document webserver and provide realtime lookups of the directory structure via an API.
I'd appreciate any thoughts or suggestions.
An alternative solution - you can prefix all directories with some string and when you get the list with files you can check which ones are actually directories by checking if they contain the string. You can completely avoid the is_dir() that way.
Old question, but current problem for me.
One solution:
On your server of better on a storage server (much much much faster) run tree https://linux.die.net/man/1/tree with -X (XML output) on every directory or once on top directory and send output to „.dirStructure.xml” file (with . at the start so you can ignore it from listing)
eg.
tree -x -f -q -s -D —dirfirst -X
Then make your script load this structure and use it to display tree structure. You can make this file for every merchant or one global one and just traverse it to find merchant.
You can run it via cron every minute or create and API to invoke running it on storage machine.
You can update this xml when changing files.
No need for a database.
You can also monitor changes to directory on storage side and recreate xml everytime something changes. https://superuser.com/questions/181517
EDIT:
How to monitor a complete directory tree for changes in Linux?
I'm a newbie in SPL and recursiveIterator... So could you help me?
What I want to achieve :
I would like to find a file in a folders tree and i would like to obtain its path.
My folder tree could seems to be like this :
./ressources
./ressources/Images
./ressources/Images/Image01
./ressources/Images/Image02
./resources/Images/ImagesSub/Image03
./ressources/Docs
./ressources/Docs/Doc01
and so on...
I obtain the name of my File with sql query (warning : they never have an extension).
Now, i want to find the file's location by doing a recursive Iterator on './ressources' folder.
Then, when i've found the file, i would like to return the whole link './ressources/Folder/File'.
I've read Gordon's solution but it doesn't work, I tried only to echo something, but doesn't display anything.
Here is my code :
$doc_id = $bean->id;
$query = "SELECT a.document_revision_id FROM documents as a, document_revisions as b ";
$query .= "WHERE a.document_revision_id = b.id AND a.id = '" . $doc_id . "' LIMIT 1";
$results = $bean->db->query($query, true);
$row = $bean->db->fetchByAssoc($results);
$file_id = $row['document_revision_id'];
$ressources = './ressources/';
$iter = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($ressources, RecursiveDirectoryIterator::KEY_AS_FILENAME), RecursiveIteratorIterator::SELF_FIRST);
foreach ($iter as $entry) {
if ($entry->getFilename() === $file_id){
echo '<script> alert('.$entry->getFilepath().');</script>';
}
}
(i know doing an alert into a echo is bullsh*t, but whith sugar it is quite difficult to display something
Specifications
I'm trying to do this in a SugarCrm CE 6.5.2 logic_hook and it's running on archlinux. And my PHP version is 5.4.6
It is really urgent, so I would be reaaaally happy if you could help me!!
Thanks by advance!
EDIT FROM 12/10/09 2pm:
What is my sugar project and why i can't get the pathname from my database
I created a custom field in Documents module called folder_name_c. You fill it with the name of the folder (under ressources) where you want to upload your document.
I want to allow the user to move the file uploaded from its ancient folder to new one when i edit the document.
When editing a document, I did a after_retrieve hook to permit the logic_hook to work when editing (before, it was just done for edit view)
So, if i get the $bean->folder_name_c, it pick up the field's content. If i try sql, it will pick the folder_name_c only after i click "save".
So, i don't have any clue to get my old folder_name to create an
$old_link = '.ressources/'.$old_folder.'/'.$file_id;
I can only create the
$new_link = '.ressources/'.$bean->folder_name_c.'/'.$file_id;
So, after a long time, i figured out that i could browse my ressources folder and my sub folders to find my file named $file_id and then create the $old_link
FYI, by creating a new custom field under studio in sugar, i gained a lot of time.
I don't want to pass my life on adding a custom_code calling database or else. this is URGENT and recursive iterator seems to be simple and quick.
There is no method such as getFilepath for the (Recursive)DirectoryIterator, just use $entry itself, when used in a string context it's casted to such one (à la __toString):
$file_id = 'test';
$ressources = './ressources/';
// [...]
echo '<script>alert('.$entry.');</script>'; // is casted to a string which contains the full path
// results in alerting ./resources/SubFolder/test
I tested it with the same structure and without extension.
So, I've found out how to use recursive iterators for my problem!
Here is my code :
$ressources = './ressources/';
$directoryIter = new RecursiveDirectoryIterator($ressources);
$iter = new RecursiveIteratorIterator($directoryIter, RecursiveIteratorIterator::SELF_FIRST);
$old_path = '';
$new_path = $ressources.$bean->folder_name_c.'/'.$file_id;
chmod($new_path,0777);
foreach ($iter as $entry) {
if (!$directoryIter->isDot()) {
if ($entry->getFileName() == $file_id) {
$old_path = $entry->getPathName();
chmod($old_path, 0777);
copy($old_path,$new_path);
}
}
}
So i succeed to get my file's origin path! :)
But as always, there is a problem:
I want to cut and paste my file from $old_path to $new_path (as you can see in my code). The copy here works well, but i don't know where i have to unlink() the old_path.. if anyone knows ...
(and if i wrote the copy in the wrong line, just tell me please! :D )
I want to check if a file path is in current directory tree.
Suppose the parameter is given as js/script.js. My working directory (WD) is /home/user1/public_html/site
Now for current WD if someone supplies js/script.js I can simply check it by appending to the WD. It works for such normal path. But if anyone (may be an attacker) wants to pass ../../../../etc/password it'd be a problem.
I know it can be suppressed by removing the .. characters using some RegEx. And that will solve it for sure. But I want to know how can I create some sort of chrooted environment sot that whatever path/to/script is passed it will be searched under WD?
Edit:
I am aware of http://php.net/chroot. it requires your app to run with root privileges.
http://php.net/manual/en/function.realpath.php
$chroot = '/var/www/site/userdata/';
$basefolder = '/var/www/site/userdata/mine/';
$param = '../../../../etc/password';
$fullpath = realpath($basefolder . $param);
if (strpos($fullpath, $chroot) !== 0) {
// GOTCHA
}
I'm creating a php site where a company will upload a lot of images. I'd like one folder to contain upto 500-1000 files and PHP automatically creates a new one if previous contains more that 1000 files.
For example, px300 has folder dir1 which stores 500 files, then a new one dir2 will be created.
Are there any existed a solutions?
This task is simple enough not to require an existing solution. You can make use of scandir to count the number of files in a directory, and then mkdir to make a directory.
// Make sure we don't count . and .. as proper directories
if (count(scandir("dir")) - 2 > 1000) {
mkdir("newdir");
}
A common approach is to create one-letter directories based on the file name. This works particularly well if you assign random names to files (and random names are good to avoid name conflicts in user uploads):
/files/a/c/acbd18db4cc2f85cedef654fccc4a4d8
/files/3/7/37b51d194a7513e45b56f6524f2d51f2
In this way:
if ($h = opendir('/dir')) {
$files = 0;
while (false !== ($file = readdir($h))) {
$files++
}
if($files > 1000){
//create dir
mkdir('/newdir')
}
}
You could use the glob function. It will return an array matching your pattern, which you could count for the amount of files.