Remove Prestashop orphan images not stored in DB - php

I need to clean a shop running Prestashop, actually 1.7, since many years.
With this script I removed all the images in the DB not connected to any product.
But there are many files not listed in the DB. For example, actually I have 5 image sizes in settings, so new products shows 6 files in the folder (the 5 above and the imageID.jpg file) but some old product had up to 18 files. Many of these old products have been deleted but in the folder I still find all the other formats, like "2026-small-cart.jpg".
So I tried creating a script to loop in folders, check image files in it and verify if that id_image is stored in the DB.
If not, I can delete the file.
It works but obviously the loop is huge and it stops working as long as I change the starting path folder.
I've tried to reduce the DB queries storing some data (to delete all the images with the same id with a single DB query), but it still crashes as I change the starting path.
It only works with two nested loops (really few...).
Here is the code. Any idea for a better way to get the result?
Thanks!
$shop_root = $_SERVER['DOCUMENT_ROOT'].'/';
include('./config/config.inc.php');
include('./init.php');
$image_folder = 'img/p/';
$image_folder = 'img/p/2/0/3/2/'; // TEST, existing product
$image_folder = 'img/p/2/0/2/6/'; // TEST, product deleted from DB but files in folder
//$image_folder = 'img/p/2/0/2/'; // test, not working...
$scan_dir = $shop_root.$image_folder;
// will check only images...
global $imgExt;
$imgExt = array("jpg","png","gif","jpeg");
// to avoid multiple queries for the same image id...
global $lastID;
global $delMode;
echo "<h1>Examined folder: $image_folder</h1>\r\n";
function checkFile($scan_dir,$name) {
global $lastID;
global $delMode;
$path = $scan_dir.$name;
$ext = substr($name,strripos($name,".")+1);
// if is an image and file name starts with a number
if (in_array($ext,$imgExt) && (int)$name>0){
// avoid extra queries...
if ($lastID == (int)$name) {
$inDb = $lastID;
} else {
$inDb = (int)Db::getInstance()->getValue('SELECT id_product FROM '._DB_PREFIX_.'image WHERE id_image ='.((int) $name));
$lastID = (int)$name;
$delMode = $inDb;
}
// if haven't found an id_product in the DB for that id_image
if ($delMode<1){
echo "- $path has no related product in the DB I'll DELETE IT<br>\r\n";
//unlink($path);
}
}
}
function checkDir($scan_dir,$name2) {
echo "<h3>Elements found in the folder <i>$scan_dir$name2</i>:</h3>\r\n";
$files = array_values(array_diff(scandir($scan_dir.$name2.'/'), array('..', '.')));
foreach ($files as $key => $name) {
$path = $scan_dir.$name;
if (is_dir($path)) {
// new loop in the subfolder
checkDir($scan_dir,$name);
} else {
// is a file, I'll check if must be deleted
checkFile($scan_dir,$name);
}
}
}
checkDir($scan_dir,'');

I would create two files with lists of images.
The first file is the result of a query from your database of every image file referenced in your data.
mysql -BN -e "select distinct id_image from ${DB}.${DB_PREFIX}image" > all_image_ids
(set the shell variables for DB and DB_PREFIX first)
The second file is every image file currently in your directories. Include only files that start with a digit and have an image extension.
find img/p -name '[0-9]*.{jpg,png,gif,jpeg}' > all_image_files
For each filename, check if it's in the list of image ids. If not, then output the command to delete the file.
cat all_image_files | while read filename ; do
# strip the directory name and convert filename to an integer value
b=$(basename $filename)
image_id=$((${b/.*/}))
grep -q "^${image_id}$" all_image_ids || echo "rm ${filename}"
done > files_to_delete
Read the file files_to_delete to visually check that the list looks right. Then run that file as a shell script:
sh files_to_delete
Note I have not tested this solution, but it should give you something to experiment with.

Related

Optimising a script to scan files in a folder

I have my folder /images (with ~ 95.000 files), and i check every file if is in the database.
Table : images
Row : hash
The folder containt all my image with sha1 name.
I use shuffle($images); to make sure the verification is random, otherwise it only verifies the first 35,000 images.
If I go over 35,000 checks, the script puts a timeout and the page blocks it.
Example name of an image : d0a0bb3149bea2335e8784812fef706ad0a13156.jpg
My Script :
I select the images in the database
I'm putting it in a array
I make the array random (to avoid always checking the first 35,000
images)
I create a array of images file in the folder /images
I check for missing database files using the array created by the
opendir(); function
I display the answer
<?php
set_time_limit(0);
$images = [];
$q = $mysqli->query('SELECT hash FROM images');
while($r = $q->fetch_assoc())
{
$images[] = $r['hash'].'.jpg';
}
shuffle($images);
$i_hors_bdd = 0;
$images_existent_hors_bdd = [];
if($dh = opendir($_SERVER['DOCUMENT_ROOT'].'/images'))
{
while(($file = readdir($dh)) !== false)
{
if(!in_array($file, $fichiers_a_exclures))
{
if(!is_sha1($file) OR !in_array($file, $images))
$images_existent_hors_bdd[] = '<p>Name of File: '.$file.'</p>';
}
if($i_hors_bdd > 35000)
{
break;
}
$i_hors_bdd++;
}
}
closedir($dh);
if(count($images_existent_hors_bdd) > 0)
{
echo '<p>Image exist, but not in the databse.</p>';
sort($images_existent_hors_bdd);
foreach($images_existent_hors_bdd as $image_existe_hors_bdd)
echo $image_existe_hors_bdd;
}
else
echo '<p>All images are in datase.</p>';
echo '<p>'.$i_hors_bdd.' images checked.</p>';
So my question is: How can I optimize this script to improve the speed of the script to allow checking more images without blocking the script? Knowing that my VPS is not very powerful and I don't have SSD.
Here are some things to consider or try:
Concatenate '.jpg' to hash in the sql, then use fetch_all into a numeric array.
use scandir to build an array of files in the directory
use array_diff to remove $fichiers_a_exclures and $images
iterate over this smallest array to do the sha1 test

search and delete unused images in articles with php

I have been working on some project and through time it got messed up with images which I tested it, so now I want to make a script which is going to search in articles img tags and find the img name (artiles are stored in mysql with attribute 'text') after scanning the folder where images are stored if they are not in any article included then to delete those images (unused images). Has anyone done this before so I could see an example or any good approach about this case?
Here's what you'll need to do what you want:
Loop through your directory of files (if they are on the filesystem):
if ($handle = opendir('/path/to/files')) {
echo "Directory handle: $handle\n";
echo "Entries:\n";
/* This is the correct way to loop over the directory. */
while (false !== ($entry = readdir($handle))) {
echo "$entry\n";
}
/* This is the WRONG way to loop over the directory. */
while ($entry = readdir($handle)) {
echo "$entry\n";
}
closedir($handle);
}
Ref. http://php.net/readdir
Loop through your files (if they are on the database):
Ref. http://www.php.net/manual/en/mysqli.query.php
Compare file names (obvious once you are looping through your resource).
Delete unused images like so http://www.php.net/unlink
Approach is simple
Query database and get list of all image URLs - add to an array
Loop through each folder that contains images and make an array of every image on the site/
here is how to find all items that are in one array but not another (may be a better answer more specific to you - array_Intesect is what you need.
with the new array simply loop through the list and delete the files.
All of the above you can search individually and then string them together.
I would recommend backing everything up before trying!!!!
I recently came accross such thing where I wanted to remove unused files that users left behind / change the profile picture but they were stored on the webserver. To fix this I used this :
$images = scandir("uploads", 1);
foreach ($images as $itemlc)
{
$res=mysql_query("SELECT * FROM company WHERE c_logo='$itemlc'");
$count = mysql_num_rows($res);
$res2=mysql_query("SELECT * FROM users WHERE u_logo='$itemlc'");
$count2 = mysql_num_rows($res2);
if($count == 1)
{
echo $itemlc; echo " exists <br><br>";
}
else if ($count2 == 1)
{
echo $itemlc; echo " exists <br><br>";
}
else{ $file_path = 'uploads/'; $src=$file_path.$itemlc; #unlink($src); }
}
Hope this helps if there is someone who needs this!

mysql - php - remove first part of url upon INSERT

Im doing an image gallery CMS using Mysql database and PHP. Im a newbie.
Im having a path problem.
here is my file structure:
this php doc - root/php/upload_portrait.php.
My images are stored here - root/images/portrait_gallery/
So I added the ../ to save the images in root/images/portrait_gallery/
that works fine.
but in the db the url is stored with the ../ and that path is incorrect since they are being called from the root index file. So no images show up.
HOW can I remove the ../ upon INSERT INTO in the database??
I have tried with replace and update but cant figure out how.
Here Is my code
$portrait_url= $_FILES['upload'];
// 2. connect to database:
include 'connect.php';
// 4. handle moving image from temp location to images folder (using the function billedupload)
$billedurl = billedupload($portrait_url);
if($billedurl == false){
die("Something is wrong");
}
// 5. Insert imageupload in database:
$query = "INSERT INTO portrait (portrait_id, portrait_url) VALUES ('$portrait_id', '$billedurl')";
$result = mysqli_query($dblink, $query) or die( "Forespørgsel 2 kunne ikke udføres: " . mysqli_error($dblink) );
// 6. close connection
mysqli_close($dblink);
function billedupload($filearray){
if($filearray['type']=='image/jpeg' or $filearray['type']=='image/png'){
$tmp_navn = $filearray['tmp_name'];
$filnavn = $filearray['name'];
$url = '../images/portrait_gallery/' . time() . $filnavn;
move_uploaded_file($tmp_navn, $url);
return $url;
}
else{
return false;
}
}
I believe you have 2 choices.
Alter the $url line of billed upload() and remove the ../ there, since you know you won't need it when you go to read it.
Alter your function that reads from the database and remove the ../ there.
str_replace() is probably the function you need, as mentioned by previous poster.
If all you need to to remove '/..' -- then,
str_replace is a function in PHP that allows you to remove/change parts of strings on the fly, so, in your code,
replace
$billedurl = billedupload($portrait_url);
with
$billedurl = str_replace('/..','',billedupload($portrait_url));

Php WHILE loops only find one element

I got a problem with the following php code. It is supposed to list the items of a S3 bucket and find&delete files which contain a certain string in their filenames.
Problem is: only one file is deleted the others remain on the bucket after the execution of the script.
I can't find where the issue comes from so I ask you :/
$aS3Files = $s3->getBucket($bucketName); // list all elements in the bucket
$query = mysql_query("SELECT filename FROM prizes_media WHERE prize_id=" . $_POST["prizeId"]); // finds all filenames linked to the prize
while($media = mysql_fetch_array($query)){
// Find relevant files
while ( list($cFilename, $rsFileData) = each($aS3Files) ) { // reformat the bucket list into a table and reads through it
if(strpos($cFilename,$media['filename'])) {
$s3->deleteObject($bucketName, $cFilename); // deletes all files that contain $media['filename'] in their filename
}
}
}
// 2. Delete DB entry
mysql_query("DELETE FROM prizes WHERE id=" . $_POST['prizeId'] ); // deletes the entry correponding to the prize in the DB (deletes media table in cascade)
You may be getting false negatives on your if, you should be using this:
if(strpos($cFilename,$media['filename']) !== FALSE) { ...
Edit
Here is a different way to loop the bucket, based on the structure on your comment:
foreach($aS3Files as $filename => $filedata) {
if(strpos($filename, $media['filename']) !== FALSE) {
$s3->deleteObject($bucketName, $filename); // deletes all files that contain $media['filename'] in their filename
}
}

Deleting Files In A Directory Based On A Table

First of all, i would like to explain my condition right now.
I'm using PHP as my programming language.
I have a table named "Produk". It keeps every product name. Example value "TWC0001" in its id_produk column.
Every product have its own images, and stored in ./images/Produk/ directory.
the problem is, this project has been working about 1 years ago, and when the users delete a product, the product's images didn't deleted too. So, it still staying in ./images/Produk/ directory. It means, that file become a garbage right?
Case Example :
in the "Produk" table, column "id_produk" i have 3 rows :
"TWC0001","TWC0002","TWC0003".
Of course each of those rows have its own images that stored in ./images/Produk/
Each of those files named :
"TWC0001.jpg", "TWC0002.jpg", "TWC0003.jpg"
Case : A user logged in and deleted row "TWC0002", of course the "TWC0002.jpg" file still exist.
Problem : I want to delete all ".jpg" files that didn't listed in the "Produk" table anymore.
I've been doing this :
//listing all the ".jpg" files
$arrayfiles=scandir("../images/Produk/");
//getting all the product list
$sql="select * from produk";
$produk=mysql_query($sql,$conn) or die("Error : ".mysql_error());
foreach($arrayfiles as $key=>$value)
{
while($row=mysql_fetch_array($produk,MYSQL_ASSOC))
{
///here is the part i've been confused of.
}
}
PHP function to delete file is "unlink()";
Please anybody help me out of this.
The following code will produce an array with all the images that have no corresponding product record. I've left off the unlink command so you can do some reviewing process first.
$sql = "SELECT * FROM Produk";
$result = mysql_query($sql);
$existing_products = array();
while ($row = mysql_fetch_array($result))
$existing_products[] = $row["id_produk"] . ".jpg";
$existing_images = array();
foreach(glob("../images/Produk/*.jpg") as $v)
$existing_images[] = str_replace("../images/Produk/", "", $v);
$images_to_delete = array_diff($existing_images, $existing_products);
try this
$it = new RecursiveIteratorIterator( new RecursiveDirectoryIterator('../images/Produk/'));
$regx = new RegexIterator( $it, '/^.*\.jpg$/i', // only matched text will be returned
RecursiveRegexIterator::GET_MATCH );
foreach ($regx as $file) {
echo $file[0] , "\n";
unlink($file[0]);
}
this will find all JPG files in the given folders and subfolders and will delete it
I would recommend following:
make directory listing of "Images" direcotry by
dir /b > filelist.txt (windows)
or
ls -1 > filelist.txt (linux)
You will have now list of existing files which should be imported to some temp table in mysql.
Now write simple SQL to select files without apropriate products (don't forget to append .JPG suffix).
with list of files to be deleted you can simply create file_get_contents and foreach loop unlink.
Reason why I recommend this is security.You can review what will be deleted.
Once you run script, there is no undo (just from backup).
foreach(glob('../images/Produk/*.jpg') as $file) {
if(is_file($file))
#unlink($file);
}

Categories