I got a problem with the following php code. It is supposed to list the items of a S3 bucket and find&delete files which contain a certain string in their filenames.
Problem is: only one file is deleted the others remain on the bucket after the execution of the script.
I can't find where the issue comes from so I ask you :/
$aS3Files = $s3->getBucket($bucketName); // list all elements in the bucket
$query = mysql_query("SELECT filename FROM prizes_media WHERE prize_id=" . $_POST["prizeId"]); // finds all filenames linked to the prize
while($media = mysql_fetch_array($query)){
// Find relevant files
while ( list($cFilename, $rsFileData) = each($aS3Files) ) { // reformat the bucket list into a table and reads through it
if(strpos($cFilename,$media['filename'])) {
$s3->deleteObject($bucketName, $cFilename); // deletes all files that contain $media['filename'] in their filename
}
}
}
// 2. Delete DB entry
mysql_query("DELETE FROM prizes WHERE id=" . $_POST['prizeId'] ); // deletes the entry correponding to the prize in the DB (deletes media table in cascade)
You may be getting false negatives on your if, you should be using this:
if(strpos($cFilename,$media['filename']) !== FALSE) { ...
Edit
Here is a different way to loop the bucket, based on the structure on your comment:
foreach($aS3Files as $filename => $filedata) {
if(strpos($filename, $media['filename']) !== FALSE) {
$s3->deleteObject($bucketName, $filename); // deletes all files that contain $media['filename'] in their filename
}
}
Related
I need to clean a shop running Prestashop, actually 1.7, since many years.
With this script I removed all the images in the DB not connected to any product.
But there are many files not listed in the DB. For example, actually I have 5 image sizes in settings, so new products shows 6 files in the folder (the 5 above and the imageID.jpg file) but some old product had up to 18 files. Many of these old products have been deleted but in the folder I still find all the other formats, like "2026-small-cart.jpg".
So I tried creating a script to loop in folders, check image files in it and verify if that id_image is stored in the DB.
If not, I can delete the file.
It works but obviously the loop is huge and it stops working as long as I change the starting path folder.
I've tried to reduce the DB queries storing some data (to delete all the images with the same id with a single DB query), but it still crashes as I change the starting path.
It only works with two nested loops (really few...).
Here is the code. Any idea for a better way to get the result?
Thanks!
$shop_root = $_SERVER['DOCUMENT_ROOT'].'/';
include('./config/config.inc.php');
include('./init.php');
$image_folder = 'img/p/';
$image_folder = 'img/p/2/0/3/2/'; // TEST, existing product
$image_folder = 'img/p/2/0/2/6/'; // TEST, product deleted from DB but files in folder
//$image_folder = 'img/p/2/0/2/'; // test, not working...
$scan_dir = $shop_root.$image_folder;
// will check only images...
global $imgExt;
$imgExt = array("jpg","png","gif","jpeg");
// to avoid multiple queries for the same image id...
global $lastID;
global $delMode;
echo "<h1>Examined folder: $image_folder</h1>\r\n";
function checkFile($scan_dir,$name) {
global $lastID;
global $delMode;
$path = $scan_dir.$name;
$ext = substr($name,strripos($name,".")+1);
// if is an image and file name starts with a number
if (in_array($ext,$imgExt) && (int)$name>0){
// avoid extra queries...
if ($lastID == (int)$name) {
$inDb = $lastID;
} else {
$inDb = (int)Db::getInstance()->getValue('SELECT id_product FROM '._DB_PREFIX_.'image WHERE id_image ='.((int) $name));
$lastID = (int)$name;
$delMode = $inDb;
}
// if haven't found an id_product in the DB for that id_image
if ($delMode<1){
echo "- $path has no related product in the DB I'll DELETE IT<br>\r\n";
//unlink($path);
}
}
}
function checkDir($scan_dir,$name2) {
echo "<h3>Elements found in the folder <i>$scan_dir$name2</i>:</h3>\r\n";
$files = array_values(array_diff(scandir($scan_dir.$name2.'/'), array('..', '.')));
foreach ($files as $key => $name) {
$path = $scan_dir.$name;
if (is_dir($path)) {
// new loop in the subfolder
checkDir($scan_dir,$name);
} else {
// is a file, I'll check if must be deleted
checkFile($scan_dir,$name);
}
}
}
checkDir($scan_dir,'');
I would create two files with lists of images.
The first file is the result of a query from your database of every image file referenced in your data.
mysql -BN -e "select distinct id_image from ${DB}.${DB_PREFIX}image" > all_image_ids
(set the shell variables for DB and DB_PREFIX first)
The second file is every image file currently in your directories. Include only files that start with a digit and have an image extension.
find img/p -name '[0-9]*.{jpg,png,gif,jpeg}' > all_image_files
For each filename, check if it's in the list of image ids. If not, then output the command to delete the file.
cat all_image_files | while read filename ; do
# strip the directory name and convert filename to an integer value
b=$(basename $filename)
image_id=$((${b/.*/}))
grep -q "^${image_id}$" all_image_ids || echo "rm ${filename}"
done > files_to_delete
Read the file files_to_delete to visually check that the list looks right. Then run that file as a shell script:
sh files_to_delete
Note I have not tested this solution, but it should give you something to experiment with.
I have my folder /images (with ~ 95.000 files), and i check every file if is in the database.
Table : images
Row : hash
The folder containt all my image with sha1 name.
I use shuffle($images); to make sure the verification is random, otherwise it only verifies the first 35,000 images.
If I go over 35,000 checks, the script puts a timeout and the page blocks it.
Example name of an image : d0a0bb3149bea2335e8784812fef706ad0a13156.jpg
My Script :
I select the images in the database
I'm putting it in a array
I make the array random (to avoid always checking the first 35,000
images)
I create a array of images file in the folder /images
I check for missing database files using the array created by the
opendir(); function
I display the answer
<?php
set_time_limit(0);
$images = [];
$q = $mysqli->query('SELECT hash FROM images');
while($r = $q->fetch_assoc())
{
$images[] = $r['hash'].'.jpg';
}
shuffle($images);
$i_hors_bdd = 0;
$images_existent_hors_bdd = [];
if($dh = opendir($_SERVER['DOCUMENT_ROOT'].'/images'))
{
while(($file = readdir($dh)) !== false)
{
if(!in_array($file, $fichiers_a_exclures))
{
if(!is_sha1($file) OR !in_array($file, $images))
$images_existent_hors_bdd[] = '<p>Name of File: '.$file.'</p>';
}
if($i_hors_bdd > 35000)
{
break;
}
$i_hors_bdd++;
}
}
closedir($dh);
if(count($images_existent_hors_bdd) > 0)
{
echo '<p>Image exist, but not in the databse.</p>';
sort($images_existent_hors_bdd);
foreach($images_existent_hors_bdd as $image_existe_hors_bdd)
echo $image_existe_hors_bdd;
}
else
echo '<p>All images are in datase.</p>';
echo '<p>'.$i_hors_bdd.' images checked.</p>';
So my question is: How can I optimize this script to improve the speed of the script to allow checking more images without blocking the script? Knowing that my VPS is not very powerful and I don't have SSD.
Here are some things to consider or try:
Concatenate '.jpg' to hash in the sql, then use fetch_all into a numeric array.
use scandir to build an array of files in the directory
use array_diff to remove $fichiers_a_exclures and $images
iterate over this smallest array to do the sha1 test
I have a database files for holding details about files in different folders and the field flink holds the path of the file.Now i want to run a search both in the folder and database and find the files that are not listed in the database.Is this possible using PHP MYSQL.I have written a sample code but it doesnt seem to work.Please note that files folder contains number of subdirectories as well.
<?php
include("dbfiles.php");
$directory='files/';
// Query database
$query = 'SELECT `flink` FROM `files`';
$result = mysqli_query($fmysqli, $query);
$db = []; // create empty array
while ($row = mysqli_fetch_row($result))
array_push($db, $row[0]);
// Check files
$files1 = scandir($directory);
if ( $files1 !== false ) {
foreach ($files1 as $i => $value) {
if (in_array($value, $db)) {
// File exists in both
echo ' Exists '.$value;
} else {
// File doesn't exist in database
echo ' Not Exists '.$value;
}
}
} else {
echo 0;
}
?>
The result is something unexpected there is a file inside BT363 Folder the path is as follows files/BT363/BT363-Metabolic Engineering and Synthetic Biology-Class Slide--Module 4-admin-admin.pptx
But i am getting the output as
Not Exists . Not Exists .. Not Exists BT363
You can list all the files in a directory by doing this:
$files = scandir($path);
Then query your database for the file information you want and then loop through it and compare the current iteration and find that value in $files.
Yes, it is possible.
Due to the extreme lack of specific detail in your question, my response is going to be equally non-specific.
You'll want to compile a list of files from your folder using glob, scandir or similar. Likewise you will want to compile a list of files in the database.
Compare the two to identify those in the folder, but not in the database.
Edit
The output you're getting . and .. are because filesystems have links to the current (.) and parent (..) directory. Typically you write code to skip these values.
For example, taking your code:
$files1 = scandir($directory)
if ($files1) {
foreach ($files1 as $value) {
if (in_array($value, ['.', '..'])) continue;
// Your other code...
}
}
I have seen several websites where if you upload an image and an identical image already exists on there servers they will reject the submission. Using PNGs is there an easy way to check one image against a massive folder of images?
http://www.imagemagick.org/discourse-server/viewtopic.php?t=12618
I did find this with imagemagick, but I am looking for one vs many and not one to one a million
You can transform the file content into a sha1. That will give you a way to identify two pictures strictly identical.
see http://php.net/manual/fr/function.sha1-file.php
Then after you save it into a NFS, or use some kind of database to test if the hash already exists.
Details of the images are probably maintained in a database; while the images are stored in the filesystem. And that database probably has a hash column which is used to store an md5 hash of the image file itself, calculated when the image is first uploaded. When a new image is uploaded, it calculates the hash for that image, and then checks to see if any other image detail in the database has a matching hash. If not, it stores the newly uploaded image with that hash; otherwise it can respond with details of the previous upload. If the hash column is indexed in the table, then this check is pretty quick.
If I understood your question correctly. You want to find out if a specific image exists in a Directory with so many images, right? If so, take a look at the solution:
<?php
// CREATE A FUNCTION WHICH RETURNS AN ARRAY OF ALL IMAGES IN A SPECIFIC FOLDER
function getAllImagesInFolder($dir_full_path){
$returnable = array();
$files_in_dir = scandir($dir_full_path);
$reg_fx = '#(\.png|\.jpg|\.bmp|\.gif|\.jpeg)#';
foreach($files_in_dir as $key=>$val){
$temp_file_or_dir = $dir_full_path . DIRECTORY_SEPARATOR . $val;
if(is_file($temp_file_or_dir) && preg_match($reg_fx, $val) ){
$regx_dot_wateva = '/\.{2,4}$/';
$regx_dot = '/\./';
$regx_array = array($regx_dot_wateva, $regx_dot);
$replace_array = array("", "_");
$return_val = preg_replace($regx_array, $replace_array, $val);
$returnable[$return_val] = $temp_file_or_dir ;
}else if(is_dir($temp_file_or_dir) && !preg_match('/^\..*/', $val) ){
getFilesInFolder($temp_file_or_dir);
}
}
return $returnable;
}
// CREATE ANOTHER FUNCTION TO CHECK IF THE SPECIFIED IMAGE EXISTS IN THE GIVEN DIRECTORY.
// THE FIRST PARAMETER SHOULD BE THE RESULT OF CALLING THE PREVIOUS FUNCTION: getAllImagesInFolder(...)
// THE SECOND PARAMETER IS THE IMAGE YOU WANT TO SEARCH WHETHER IT EXISTS IN THE SAID FOLDER OR NOT
function imageExistsInFolder($arrImagesInFolder, $searchedImage){
if(!is_array($arrImagesInFolder) && count($arrImagesInFolder) < 1){
return false;
}
foreach($arrImagesInFolder as $strKey=>$imgPath){
if(stristr($imgPath, $searchedImage)){
return true;
}
}
return false;
}
// NOW GET ALL THE IMAGES IN A SPECIFIED FOLDER AND ASSIGN THE RESULTING ARRAY TO A VARIABLE: $imgFiles
$imgFolder = "/path/to/directory/where/there/are/images";
$arrImgFiles = getAllImagesInFolder($imgFolder);
$searchedImage = "sandwich.jpg"; //<== OR EVEN WITHOUT THE EXTENSION, JUST "sandwich"
// ASSUMING THE SPECIFIC IMAGE YOU WANT TO MATCH IS CALLED sandwich.jpg
// YOU CAN USE THE imageExistsInFolder(...) FUNCTION TO RETURN A BOOLEAN FLAG OF true OR false
// DEPENDING ON IF IT DOES OR NOT.
var_dump($arrImgFiles);
var_dump( imageExistsInFolder($arrImgFiles, $searchedImage) );
First of all, i would like to explain my condition right now.
I'm using PHP as my programming language.
I have a table named "Produk". It keeps every product name. Example value "TWC0001" in its id_produk column.
Every product have its own images, and stored in ./images/Produk/ directory.
the problem is, this project has been working about 1 years ago, and when the users delete a product, the product's images didn't deleted too. So, it still staying in ./images/Produk/ directory. It means, that file become a garbage right?
Case Example :
in the "Produk" table, column "id_produk" i have 3 rows :
"TWC0001","TWC0002","TWC0003".
Of course each of those rows have its own images that stored in ./images/Produk/
Each of those files named :
"TWC0001.jpg", "TWC0002.jpg", "TWC0003.jpg"
Case : A user logged in and deleted row "TWC0002", of course the "TWC0002.jpg" file still exist.
Problem : I want to delete all ".jpg" files that didn't listed in the "Produk" table anymore.
I've been doing this :
//listing all the ".jpg" files
$arrayfiles=scandir("../images/Produk/");
//getting all the product list
$sql="select * from produk";
$produk=mysql_query($sql,$conn) or die("Error : ".mysql_error());
foreach($arrayfiles as $key=>$value)
{
while($row=mysql_fetch_array($produk,MYSQL_ASSOC))
{
///here is the part i've been confused of.
}
}
PHP function to delete file is "unlink()";
Please anybody help me out of this.
The following code will produce an array with all the images that have no corresponding product record. I've left off the unlink command so you can do some reviewing process first.
$sql = "SELECT * FROM Produk";
$result = mysql_query($sql);
$existing_products = array();
while ($row = mysql_fetch_array($result))
$existing_products[] = $row["id_produk"] . ".jpg";
$existing_images = array();
foreach(glob("../images/Produk/*.jpg") as $v)
$existing_images[] = str_replace("../images/Produk/", "", $v);
$images_to_delete = array_diff($existing_images, $existing_products);
try this
$it = new RecursiveIteratorIterator( new RecursiveDirectoryIterator('../images/Produk/'));
$regx = new RegexIterator( $it, '/^.*\.jpg$/i', // only matched text will be returned
RecursiveRegexIterator::GET_MATCH );
foreach ($regx as $file) {
echo $file[0] , "\n";
unlink($file[0]);
}
this will find all JPG files in the given folders and subfolders and will delete it
I would recommend following:
make directory listing of "Images" direcotry by
dir /b > filelist.txt (windows)
or
ls -1 > filelist.txt (linux)
You will have now list of existing files which should be imported to some temp table in mysql.
Now write simple SQL to select files without apropriate products (don't forget to append .JPG suffix).
with list of files to be deleted you can simply create file_get_contents and foreach loop unlink.
Reason why I recommend this is security.You can review what will be deleted.
Once you run script, there is no undo (just from backup).
foreach(glob('../images/Produk/*.jpg') as $file) {
if(is_file($file))
#unlink($file);
}