search and delete unused images in articles with php - php

I have been working on some project and through time it got messed up with images which I tested it, so now I want to make a script which is going to search in articles img tags and find the img name (artiles are stored in mysql with attribute 'text') after scanning the folder where images are stored if they are not in any article included then to delete those images (unused images). Has anyone done this before so I could see an example or any good approach about this case?

Here's what you'll need to do what you want:
Loop through your directory of files (if they are on the filesystem):
if ($handle = opendir('/path/to/files')) {
echo "Directory handle: $handle\n";
echo "Entries:\n";
/* This is the correct way to loop over the directory. */
while (false !== ($entry = readdir($handle))) {
echo "$entry\n";
}
/* This is the WRONG way to loop over the directory. */
while ($entry = readdir($handle)) {
echo "$entry\n";
}
closedir($handle);
}
Ref. http://php.net/readdir
Loop through your files (if they are on the database):
Ref. http://www.php.net/manual/en/mysqli.query.php
Compare file names (obvious once you are looping through your resource).
Delete unused images like so http://www.php.net/unlink

Approach is simple
Query database and get list of all image URLs - add to an array
Loop through each folder that contains images and make an array of every image on the site/
here is how to find all items that are in one array but not another (may be a better answer more specific to you - array_Intesect is what you need.
with the new array simply loop through the list and delete the files.
All of the above you can search individually and then string them together.
I would recommend backing everything up before trying!!!!

I recently came accross such thing where I wanted to remove unused files that users left behind / change the profile picture but they were stored on the webserver. To fix this I used this :
$images = scandir("uploads", 1);
foreach ($images as $itemlc)
{
$res=mysql_query("SELECT * FROM company WHERE c_logo='$itemlc'");
$count = mysql_num_rows($res);
$res2=mysql_query("SELECT * FROM users WHERE u_logo='$itemlc'");
$count2 = mysql_num_rows($res2);
if($count == 1)
{
echo $itemlc; echo " exists <br><br>";
}
else if ($count2 == 1)
{
echo $itemlc; echo " exists <br><br>";
}
else{ $file_path = 'uploads/'; $src=$file_path.$itemlc; #unlink($src); }
}
Hope this helps if there is someone who needs this!

Related

Optimising a script to scan files in a folder

I have my folder /images (with ~ 95.000 files), and i check every file if is in the database.
Table : images
Row : hash
The folder containt all my image with sha1 name.
I use shuffle($images); to make sure the verification is random, otherwise it only verifies the first 35,000 images.
If I go over 35,000 checks, the script puts a timeout and the page blocks it.
Example name of an image : d0a0bb3149bea2335e8784812fef706ad0a13156.jpg
My Script :
I select the images in the database
I'm putting it in a array
I make the array random (to avoid always checking the first 35,000
images)
I create a array of images file in the folder /images
I check for missing database files using the array created by the
opendir(); function
I display the answer
<?php
set_time_limit(0);
$images = [];
$q = $mysqli->query('SELECT hash FROM images');
while($r = $q->fetch_assoc())
{
$images[] = $r['hash'].'.jpg';
}
shuffle($images);
$i_hors_bdd = 0;
$images_existent_hors_bdd = [];
if($dh = opendir($_SERVER['DOCUMENT_ROOT'].'/images'))
{
while(($file = readdir($dh)) !== false)
{
if(!in_array($file, $fichiers_a_exclures))
{
if(!is_sha1($file) OR !in_array($file, $images))
$images_existent_hors_bdd[] = '<p>Name of File: '.$file.'</p>';
}
if($i_hors_bdd > 35000)
{
break;
}
$i_hors_bdd++;
}
}
closedir($dh);
if(count($images_existent_hors_bdd) > 0)
{
echo '<p>Image exist, but not in the databse.</p>';
sort($images_existent_hors_bdd);
foreach($images_existent_hors_bdd as $image_existe_hors_bdd)
echo $image_existe_hors_bdd;
}
else
echo '<p>All images are in datase.</p>';
echo '<p>'.$i_hors_bdd.' images checked.</p>';
So my question is: How can I optimize this script to improve the speed of the script to allow checking more images without blocking the script? Knowing that my VPS is not very powerful and I don't have SSD.
Here are some things to consider or try:
Concatenate '.jpg' to hash in the sql, then use fetch_all into a numeric array.
use scandir to build an array of files in the directory
use array_diff to remove $fichiers_a_exclures and $images
iterate over this smallest array to do the sha1 test

Add icon next to filename

I have the following code which sorts files in its current directory:
<?php
$folders = array_filter(glob('*'), 'is_dir');
foreach ($folders as $foldlist) {
echo "<tr><td><img src=\"/index/RESSOURCES/icon/folder.png\"></td><td>{$foldlist}</td><td><img src=\"/RESSOURCES/icon/info.png\"></td></tr>";
}
$files = glob("*.*");
foreach ($files as $filename) {
$type=substr($filename,strrpos($filename,'.')+1);
echo "<tr><td><img src=\"/index/RESSOURCES/icon/{$type}.png\"></td><td>{$filename}</td><td><img src=\"/RESSOURCES/icon/info.png\"></td></tr>";
}
?>
It works, don't worry about that. There is only a minor problem that I've been troubleshooting for the last few days:
If you run my code, you'll see that before every file name, there is an icon. It fetches the right icon by taking the file s format. Cool, right?
But here is my problem:
Lets say I have two files: dummy.zip and dummy.tar.
Both files will fetch "zip.png" and "tar.png" - The two icons are exactly the same. So basically, im making the client load two times the same icon, witch makes my page significally slower. Nothing dramatic? Well, I have over a hundred files right now... Witch pretty much all of them having a different format.
How can I make it so:
if $icon == zip OR tar OR gz LOAD zip.png?
Cheers.
After your line
$type = substr($filename,strrpos($filename,'.')+1);
and before
echo "<tr><td><img src=\"/index/RESSOURCES/icon/{$type}.png\"></td><td>{$filename}</td><td><img src=\"/RESSOURCES/icon/info.png\"></td></tr>";
you may just add the following code
if($type == 'zip' || $type == 'tar' || $type == 'gz') {
$type = 'zip';
}
It will load zip.png for all the three cases

List files not in database

I have a database files for holding details about files in different folders and the field flink holds the path of the file.Now i want to run a search both in the folder and database and find the files that are not listed in the database.Is this possible using PHP MYSQL.I have written a sample code but it doesnt seem to work.Please note that files folder contains number of subdirectories as well.
<?php
include("dbfiles.php");
$directory='files/';
// Query database
$query = 'SELECT `flink` FROM `files`';
$result = mysqli_query($fmysqli, $query);
$db = []; // create empty array
while ($row = mysqli_fetch_row($result))
array_push($db, $row[0]);
// Check files
$files1 = scandir($directory);
if ( $files1 !== false ) {
foreach ($files1 as $i => $value) {
if (in_array($value, $db)) {
// File exists in both
echo ' Exists '.$value;
} else {
// File doesn't exist in database
echo ' Not Exists '.$value;
}
}
} else {
echo 0;
}
?>
The result is something unexpected there is a file inside BT363 Folder the path is as follows files/BT363/BT363-Metabolic Engineering and Synthetic Biology-Class Slide--Module 4-admin-admin.pptx
But i am getting the output as
Not Exists . Not Exists .. Not Exists BT363
You can list all the files in a directory by doing this:
$files = scandir($path);
Then query your database for the file information you want and then loop through it and compare the current iteration and find that value in $files.
Yes, it is possible.
Due to the extreme lack of specific detail in your question, my response is going to be equally non-specific.
You'll want to compile a list of files from your folder using glob, scandir or similar. Likewise you will want to compile a list of files in the database.
Compare the two to identify those in the folder, but not in the database.
Edit
The output you're getting . and .. are because filesystems have links to the current (.) and parent (..) directory. Typically you write code to skip these values.
For example, taking your code:
$files1 = scandir($directory)
if ($files1) {
foreach ($files1 as $value) {
if (in_array($value, ['.', '..'])) continue;
// Your other code...
}
}

is_dir doesn't work with for loop

I'd like to loop through images and thumbnails from a folder and insert them into a database.
I want to use is_dir to filter out directories.
I have:
$images = scandir('./images/all_comics/');
$thumbs = scandir('./images/thumbnails/');
for($x=0; $x<count($images); $x++)
{
if(!is_dir($images[$x]))
{
//This shows all images WITHOUT directories
echo $images[$x];
//This is STILL adding images AND directories to database
mysql_query("INSERT INTO images (imgpath, thumbpath) VALUES ('$images[$x]', '$thumbs[$x]')");
}
}
I have a check in there directly after !is_dir, echo $images[$x] ,which echos out all images without the directories, as desired.
But when I check the insert in the database, I see that the directories have been added as records. Why is this?
Thank you!
(Deleting old answer, as the issue was a typo)
scandir returns a list of files in a given directory. When you use is_dir, it's looking in the current directory for those files. I think what you need to do is:
if(!is_dir("./images/all_comics/" . $images[$x])) {
....
Your echo is executed inside if, but query does not:
for($x=0; $x<count($images); $x++)
{
if(!is_dir($images[$x]))
{
echo $images[$x]; //This shows all images WITHOUT directories
mysql_query("INSERT INTO images (imgpath, thumbpath) VALUES ('$images[$x]', '$thumbs[$x]')");
}
}
Also, get rid of mysql_* for PDO, and consider glob as a way to browse for files excluding directories.
You can also use glob
$files = glob('./images/all_comics/*');
foreach ($files as $file) {
if (!is_dir($file)) {
//Do Insert
}
}

PHP readir results - trying to sort by date created and also get rid of "." and ".."

I have a double question. Part one: I've pulled a nice list of pdf files from a directory and have appended a file called download.php to the "href" link so the pdf files don't try to open as a web page (they do save/save as instead). Trouble is I need to order the pdf files/links by date created. I've tried lots of variations but nothing seems to work! Script below. I'd also like to get rid of the "." and ".." directory dots! Any ideas on how to achieve all of that. Individually, these problems have been solved before, but not with my appended download.php scenario :)
<?php
$dir="../uploads2"; // Directory where files are stored
if ($dir_list = opendir($dir))
{
while(($filename = readdir($dir_list)) !== false)
{
?>
<p><a href="http://www.duncton.org/download.php?file=login/uploads2/<?php echo $filename; ?>"><?php echo $filename;
?></a></p>
<?php
}
closedir($dir_list);
}
?>
While you can filter them out*, the . and .. handles always come first. So you could just cut them away. In particular if you use the simpler scandir() method:
foreach (array_slice(scandir($dir), 2) as $filename) {
One could also use glob("dir/*") which skips dotfiles implicitly. As it returns the full path sorting by ctime then becomes easier as well:
$files = glob("dir/*");
// make filename->ctime mapping
$files = array_combine($files, array_map("filectime", $files));
// sorts filename list
arsort($files);
$files = array_keys($files);

Categories