I have a network storage device that contains a few hundred thousand mp3 files, organized by [artist]/[album] hierarchy. I need to identify newly added artist folders and/or newly added album folders programmatically on demand (not monitoring, but by request).
Our dev server is Windows-based, the production server will be FreeBSD. A cross-platform solution is optimal because the production server may not always be *nix, and I'd like to spend as little time on reconciling the (unavoidable) differences between the dev and production server as possible.
I have a working proof-of-concept that is Windows platform-dependent: using a Scripting.FileSystemObject COM object I am iterating through all top-level (artist) directories and checking the size of the directory. If there is a change, then the directory is further explored to find new album folders. As the directories are iterated, the path and file size is collected into an array, which I write serialized into a file for next time. This array is used on a subsequent call, both to identify changed artist directories (new album added) as well as identifying completely new artist directories.
This feels convoluted, and as I mentioned it is platform-dependent. To boil it down, my goals are:
Identify new top-tier directories
Identify new second-tier directories
Identify new loose files within the top-tier directories
Execution time is not a concern here, and security is not an obstacle: this is an internal-only project using only intranet assets, so we can do whatever has to be done to facilitate the desired end result.
Here's my working proof-of-concept:
// read the cached list of artist folders
$folder_list_cache_file = 'seartistfolderlist.pctf';
$fh = fopen($folder_list_cache_file, 'r');
$folder_list_cache = fread($fh, filesize($folder_list_cache_file));
fclose($fh);
if (!$folder_list_cache)
$folder_list_cache = '';
$folder_list_cache = unserialize($folder_list_cache);
if (!is_array($folder_list_cache))
$folder_list_cache = array();
// container arrays
$found_artist_folders = array();
$newly_found_artist_folders = array();
$changed_artist_folders = array();
$filesystem = new COM('Scripting.FileSystemObject');
$dir = "//network_path_to_folders/";
if ($handle = opendir($dir)) {
// loop the directories
while (false !== ($file = readdir($handle))) {
// skip non-entities
if ($file == '.' || $file == '..')
continue;
// make a key-friendly version of the artist name, skip invalids
// ie 10000-maniacs
$file_t = trim(post_slug($file));
if (strlen($file_t) < 1)
continue;
// build the full path
$pth = $dir.$file;
// skip loose top-level files
if (!is_dir($pth))
continue;
// attempt to get the size of the directory
$size = 'ERR';
try {
$f = $filesystem->getfolder($pth);
$size = $f->Size();
} catch (Exception $e) {
/* failed to get size */
}
// if the artist is not known, they are newly added
if (!array_key_exists($file_t, $folder_list_cache)) {
$newly_found_artist_folders[$file_t] = $file;
} elseif (array_key_exists($file_t, $folder_list_cache) && $size != $folder_list_cache[$file_t]['size']) {
// if the artist is known but the size is different, a new album is added
$changed_artist_folders[] = $file;
}
// build a list of everything, along with file size to write into the cache file
$found_artist_folders[$file_t] = array (
'path'=>$file,
'size'=>$size
);
}
closedir($handle);
}
// write the list to a file for next time
$fh = fopen($folder_list_cache_file, 'w') or die("can't open file");
fwrite($fh, serialize($found_artist_folders));
fclose($fh);
// deal with discovered additions and changes....
Another thing to mention: because these are MP3s, the sizes I'm dealing with are big. So big, in fact, that I have to watch out for PHP's limitation on unsized integers. The drive is currently at 90% utilization of 1.7TB (yes, SATA in RAID), a new set of multi-TB drives will be added soon only to be filled up in short order.
EDIT
I did not mention the database because I thought it would be a needless detail, but there IS a database. This script is seeking new additions to the digital portion of our music library; at the end of the code where it says "deal with discovered additions and changes", it is reading ID3 tags and doing Amazon lookups, then adding the new stuff to a database table. Someone will come along and review the new additions and screen the data, then it will be added it to the "official" database of albums available for play. Many of the songs we're dealing with are by local artists, so the ID3 and Amazon lookups don't give the track titles, album name, etc. In that case, the human intervention is critical to fill in the missing data.
Simplest thing for the BSD-side is a find script that simply looks for inodes with a ctime greater than the last time it ran.
Leave a sentinel file somewhere to 'store' the last run time, which you can do with a simple
touch /tmp/find_sentinel
and then
find /top/of/mp3/tree --cnewer /tmp/find_sentinel
which will produce a list of files/directory which have been changed since the find_sentinel file was touched. Running this via cron will get you regular updates, and the script doing the find can them digest the returned file data into your database for processing.
You could accomplish something similar on the Windows-side with Cygwin, which'd provide an identical 'find' app.
DirectoryIterator will help you walk the filesystem. You should consider putting the information in a database though.
I'd go with a solution that enumerates the contents of each folder in a MySQL database; your scan can quickly check against the contents listed in the database, and add entries that aren't already there. This gives you nice enumeration and searchability of the contents, and should be plenty fast for your needs.
Related
I'm using moodle filemanager to get a file from user and save it permanently like this:
$fs = get_file_storage();
$pluginname='profile_field_fileupload';
$pluginfolder= 'profile_field_profileimage';
$draftitemid=file_get_submitted_draft_itemid($this->inputname);
if (empty($entry->id)) {
$entry = new stdClass;
$entry->id = $this->userid;
}
$context = context_user::instance($this->userid);
$files = $fs->get_area_files($context->id, $pluginname,$pluginfolder,false,'',false);
foreach ($files as $file) {
$file->delete();
}
file_save_draft_area_files($draftitemid, $context->id, $pluginname,$pluginfolder,$entry->id,array('subdirs'=>false, 'maxfiles'=>1));
But draft still exists.
How should I remove draft after saving it?
Wait a few days - Moodle's cron process automatically cleans up draft files after that point (the delay is to make sure you haven't still got a copy of the form open and in use).
Remember that the draft area files are taking up no extra storage space on your server, as all files with identical content are stored only once, with multiple entries in the 'mdl_files' table all pointing to the same physical location on the server's hard disk.
Consider this code:
public static function removeDir($src)
{
if (is_dir($src)) {
$dir = #opendir($src);
if ($dir === false)
return;
while(($file = readdir($dir)) !== false) {
if ($file != '.' && $file != '..') {
$path = $src . DIRECTORY_SEPARATOR . $file;
if (is_dir($path)) {
self::removeDir($path);
} else {
#unlink($path);
}
}
}
closedir($dir);
#rmdir($src);
}
}
This will remove a directory. But if unlink fails or opendir fails on any subdirectory, the directory will be left with some content.
I want either everything deleted, or nothing deleted. I'm thinking of copying the directory before removal and if anything fails, restoring the copy. But maybe there's a better way - like locking the files or something similar?
I general I would confirm the comment:
"Copy it, delete it, copy back if deleted else throw deleting message fail..." – We0
However let's take some side considerations:
Trying to implement a transaction save file deletion indicates that you want to allow competing file locks on the same set of files. Transaction handling is usually the most 'expensive' way to ensure consistency. This holds true even if php would have any kind of testdelete available, because you would need to testdelete everything in a first run and then do a second loop which costs time (and where you are in danger that something changed on your file system in the meanwhile). There are other options:
Try to isolate what really needs to be transaction save and handle those data accesses in databases. Eg. MySQL/InnoDB supports all the nitty gritty details of transaction handling
Define and implement dedicated 'write/lock ownership'. So you have folders A and B with sub items and your php is allowed to lock files in A and some other process is allowed to lock files in B. Both your php and the other process are allowed to read A and B. This gets tricky on files, because a file read causes a lock as well which lasts the longer the bigger the files are. So on file basis you probably need to enrich this with file size limits, torance periods and so on.
Define and implement dedicated access time frames. Eg. All files can be used within the week, but you have a maintenance time frame at sunday night which can also run deletions and therefore requires lock free environments.
Right, let's say my reasoning was not frightening enough :) - and you implement a transaction save file deletion anyway - your routine can be implemented this way:
backup all files
if the backup fails you could try a second, third, fourth time (this is an implementation decision)
if there is no successful backup, full stop
run your deletion process, two implementation options (in any way you need to log the files you deleted successfully):
always run through fully, and document all errors (this can be returned to the user later on as homework task list, however potentially runs long)
run through and stop at the first error
if the deletion was the successful, all fine/full stop; if not proceed with rolling back
copy back only previously successful deleted file from the archive (ONLY THEM!)
Wipe out your backup
This then is only transaction save on file level. It does NOT handle the case where somebody changes permissions on folders in between step 5 and 6.
Or you could try to just rename/move the directory to something like /tmp/, it succeeds or doesnt - but the files are not gone. Even if another process would have an open handle, the move should be ok. The files will be gone some time later when the tmp folder is emptied.
I have an HTML form and one of the inputs creates a folder. The folder name is chosen by the website visitor. Every visitor creates his own folder on my website so they are randomly generated. They are created using PHP Code.
Now I would like to write a PHP code to copy a file to all of the child directories regardless the quantity of directories being generated.
I do not wish to stay writing a PHP line for every directory that is created - i.e. inserting the filename name manually (e.g. folder01, xyzfolder, folderabc, etc...) but rather automatically.
I Googled but I was unsuccessful. Is this possible? If yes, how can I go about it?
Kindly ignore security, etc... I am testing it internally prior to rolling out on a larger scale.
Thank you
It is sad I cannot comment so go on...
//get the new folder name
$newfolder = $_POST['newfoldername'];
//create it if not exist
if(!is_dir("./$newfolder")) {
mkdir("./$newfolder", 0777, true);
}
//list all folder
$dirname = './';
$dir = opendir($dirname);
while($file = readdir($dir)) {
if(($file != '.' OR $file != '..') AND is_dir($dirname.$file))
{
//generate a randomname
$str = 'yourmotherisveryniceABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
$randomname = str_shuffle($str);
$actualdir = $dirname.$file;
//copy of the file
copy($uploadedfile['tmp_name'], $actualdir.$randomname);
}
}
closedir($dir);
I just want to say, you seem to be lazy by looking for what you want to do. because when I read "I would like to write a PHP code to copy" the answer is in your sentence: copy PHP and list of folders regarless how many? Then just simply list it !
Maybe you need to learn how to use google... If you search "I would like to write a PHP code to copy a file to all of the child directories regardless the quantity of directories being generated" Sure you will never find.
I'm creating a php site where a company will upload a lot of images. I'd like one folder to contain upto 500-1000 files and PHP automatically creates a new one if previous contains more that 1000 files.
For example, px300 has folder dir1 which stores 500 files, then a new one dir2 will be created.
Are there any existed a solutions?
This task is simple enough not to require an existing solution. You can make use of scandir to count the number of files in a directory, and then mkdir to make a directory.
// Make sure we don't count . and .. as proper directories
if (count(scandir("dir")) - 2 > 1000) {
mkdir("newdir");
}
A common approach is to create one-letter directories based on the file name. This works particularly well if you assign random names to files (and random names are good to avoid name conflicts in user uploads):
/files/a/c/acbd18db4cc2f85cedef654fccc4a4d8
/files/3/7/37b51d194a7513e45b56f6524f2d51f2
In this way:
if ($h = opendir('/dir')) {
$files = 0;
while (false !== ($file = readdir($h))) {
$files++
}
if($files > 1000){
//create dir
mkdir('/newdir')
}
}
You could use the glob function. It will return an array matching your pattern, which you could count for the amount of files.
I have a directory with a number of subdirectories that users add files to via FTP. I'm trying to develop a php script (which I will run as a cron job) that will check the directory and its subdirectories for any changes in the files, file sizes or dates modified. I've searched long and hard and have so far only found one script that works, which I've tried to modify - original located here - however it only seems to send the first email notification showing me what is listed in the directories. It also creates a text file of the directory and subdirectory contents, but when the script runs a second time it seems to fall over, and I get an email with no contents.
Anyone out there know a simple way of doing this in php? The script I found is pretty complex and I've tried for hours to debug it with no success.
Thanks in advance!
Here you go:
$log = '/path/to/your/log.js';
$path = '/path/to/your/dir/with/files/';
$files = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($path), RecursiveIteratorIterator::SELF_FIRST);
$result = array();
foreach ($files as $file)
{
if (is_file($file = strval($file)) === true)
{
$result[$file] = sprintf('%u|%u', filesize($file), filemtime($file));
}
}
if (is_file($log) !== true)
{
file_put_contents($log, json_encode($result), LOCK_EX);
}
// are there any differences?
if (count($diff = array_diff($result, json_decode(file_get_contents($log), true))) > 0)
{
// send email with mail(), SwiftMailer, PHPMailer, ...
$email = 'The following files have changed:' . "\n" . implode("\n", array_keys($diff));
// update the log file with the new file info
file_put_contents($log, json_encode($result), LOCK_EX);
}
I am assuming you know how to send an e-mail. Also, please keep in mind that the $log file should be kept outside the $path you want to monitor, for obvious reasons of course.
After reading your question a second time, I noticed that you mentioned you want to check if the files change, I'm only doing this check with the size and date of modification, if you really want to check if the file contents are different I suggest you use a hash of the file, so this:
$result[$file] = sprintf('%u|%u', filesize($file), filemtime($file));
Becomes this:
$result[$file] = sprintf('%u|%u|%s', filesize($file), filemtime($file), md5_file($file));
// or
$result[$file] = sprintf('%u|%u|%s', filesize($file), filemtime($file), sha1_file($file));
But bare in mind that this will be much more expensive since the hash functions have to open and read all the contents of your 1-5 MB CSV files.
I like sfFinder so much that I wrote my own adaption:
http://www.symfony-project.org/cookbook/1_0/en/finder
https://github.com/homer6/altumo/blob/master/source/php/Utils/Finder.php
Simple to use, works well.
However, for your use, depending on the size of the files, I'd put everything in a git repository. It's easy to track then.
HTH