I have my folder /images (with ~ 95.000 files), and i check every file if is in the database.
Table : images
Row : hash
The folder containt all my image with sha1 name.
I use shuffle($images); to make sure the verification is random, otherwise it only verifies the first 35,000 images.
If I go over 35,000 checks, the script puts a timeout and the page blocks it.
Example name of an image : d0a0bb3149bea2335e8784812fef706ad0a13156.jpg
My Script :
I select the images in the database
I'm putting it in a array
I make the array random (to avoid always checking the first 35,000
images)
I create a array of images file in the folder /images
I check for missing database files using the array created by the
opendir(); function
I display the answer
<?php
set_time_limit(0);
$images = [];
$q = $mysqli->query('SELECT hash FROM images');
while($r = $q->fetch_assoc())
{
$images[] = $r['hash'].'.jpg';
}
shuffle($images);
$i_hors_bdd = 0;
$images_existent_hors_bdd = [];
if($dh = opendir($_SERVER['DOCUMENT_ROOT'].'/images'))
{
while(($file = readdir($dh)) !== false)
{
if(!in_array($file, $fichiers_a_exclures))
{
if(!is_sha1($file) OR !in_array($file, $images))
$images_existent_hors_bdd[] = '<p>Name of File: '.$file.'</p>';
}
if($i_hors_bdd > 35000)
{
break;
}
$i_hors_bdd++;
}
}
closedir($dh);
if(count($images_existent_hors_bdd) > 0)
{
echo '<p>Image exist, but not in the databse.</p>';
sort($images_existent_hors_bdd);
foreach($images_existent_hors_bdd as $image_existe_hors_bdd)
echo $image_existe_hors_bdd;
}
else
echo '<p>All images are in datase.</p>';
echo '<p>'.$i_hors_bdd.' images checked.</p>';
So my question is: How can I optimize this script to improve the speed of the script to allow checking more images without blocking the script? Knowing that my VPS is not very powerful and I don't have SSD.
Here are some things to consider or try:
Concatenate '.jpg' to hash in the sql, then use fetch_all into a numeric array.
use scandir to build an array of files in the directory
use array_diff to remove $fichiers_a_exclures and $images
iterate over this smallest array to do the sha1 test
Related
I have a database files for holding details about files in different folders and the field flink holds the path of the file.Now i want to run a search both in the folder and database and find the files that are not listed in the database.Is this possible using PHP MYSQL.I have written a sample code but it doesnt seem to work.Please note that files folder contains number of subdirectories as well.
<?php
include("dbfiles.php");
$directory='files/';
// Query database
$query = 'SELECT `flink` FROM `files`';
$result = mysqli_query($fmysqli, $query);
$db = []; // create empty array
while ($row = mysqli_fetch_row($result))
array_push($db, $row[0]);
// Check files
$files1 = scandir($directory);
if ( $files1 !== false ) {
foreach ($files1 as $i => $value) {
if (in_array($value, $db)) {
// File exists in both
echo ' Exists '.$value;
} else {
// File doesn't exist in database
echo ' Not Exists '.$value;
}
}
} else {
echo 0;
}
?>
The result is something unexpected there is a file inside BT363 Folder the path is as follows files/BT363/BT363-Metabolic Engineering and Synthetic Biology-Class Slide--Module 4-admin-admin.pptx
But i am getting the output as
Not Exists . Not Exists .. Not Exists BT363
You can list all the files in a directory by doing this:
$files = scandir($path);
Then query your database for the file information you want and then loop through it and compare the current iteration and find that value in $files.
Yes, it is possible.
Due to the extreme lack of specific detail in your question, my response is going to be equally non-specific.
You'll want to compile a list of files from your folder using glob, scandir or similar. Likewise you will want to compile a list of files in the database.
Compare the two to identify those in the folder, but not in the database.
Edit
The output you're getting . and .. are because filesystems have links to the current (.) and parent (..) directory. Typically you write code to skip these values.
For example, taking your code:
$files1 = scandir($directory)
if ($files1) {
foreach ($files1 as $value) {
if (in_array($value, ['.', '..'])) continue;
// Your other code...
}
}
I have seen several websites where if you upload an image and an identical image already exists on there servers they will reject the submission. Using PNGs is there an easy way to check one image against a massive folder of images?
http://www.imagemagick.org/discourse-server/viewtopic.php?t=12618
I did find this with imagemagick, but I am looking for one vs many and not one to one a million
You can transform the file content into a sha1. That will give you a way to identify two pictures strictly identical.
see http://php.net/manual/fr/function.sha1-file.php
Then after you save it into a NFS, or use some kind of database to test if the hash already exists.
Details of the images are probably maintained in a database; while the images are stored in the filesystem. And that database probably has a hash column which is used to store an md5 hash of the image file itself, calculated when the image is first uploaded. When a new image is uploaded, it calculates the hash for that image, and then checks to see if any other image detail in the database has a matching hash. If not, it stores the newly uploaded image with that hash; otherwise it can respond with details of the previous upload. If the hash column is indexed in the table, then this check is pretty quick.
If I understood your question correctly. You want to find out if a specific image exists in a Directory with so many images, right? If so, take a look at the solution:
<?php
// CREATE A FUNCTION WHICH RETURNS AN ARRAY OF ALL IMAGES IN A SPECIFIC FOLDER
function getAllImagesInFolder($dir_full_path){
$returnable = array();
$files_in_dir = scandir($dir_full_path);
$reg_fx = '#(\.png|\.jpg|\.bmp|\.gif|\.jpeg)#';
foreach($files_in_dir as $key=>$val){
$temp_file_or_dir = $dir_full_path . DIRECTORY_SEPARATOR . $val;
if(is_file($temp_file_or_dir) && preg_match($reg_fx, $val) ){
$regx_dot_wateva = '/\.{2,4}$/';
$regx_dot = '/\./';
$regx_array = array($regx_dot_wateva, $regx_dot);
$replace_array = array("", "_");
$return_val = preg_replace($regx_array, $replace_array, $val);
$returnable[$return_val] = $temp_file_or_dir ;
}else if(is_dir($temp_file_or_dir) && !preg_match('/^\..*/', $val) ){
getFilesInFolder($temp_file_or_dir);
}
}
return $returnable;
}
// CREATE ANOTHER FUNCTION TO CHECK IF THE SPECIFIED IMAGE EXISTS IN THE GIVEN DIRECTORY.
// THE FIRST PARAMETER SHOULD BE THE RESULT OF CALLING THE PREVIOUS FUNCTION: getAllImagesInFolder(...)
// THE SECOND PARAMETER IS THE IMAGE YOU WANT TO SEARCH WHETHER IT EXISTS IN THE SAID FOLDER OR NOT
function imageExistsInFolder($arrImagesInFolder, $searchedImage){
if(!is_array($arrImagesInFolder) && count($arrImagesInFolder) < 1){
return false;
}
foreach($arrImagesInFolder as $strKey=>$imgPath){
if(stristr($imgPath, $searchedImage)){
return true;
}
}
return false;
}
// NOW GET ALL THE IMAGES IN A SPECIFIED FOLDER AND ASSIGN THE RESULTING ARRAY TO A VARIABLE: $imgFiles
$imgFolder = "/path/to/directory/where/there/are/images";
$arrImgFiles = getAllImagesInFolder($imgFolder);
$searchedImage = "sandwich.jpg"; //<== OR EVEN WITHOUT THE EXTENSION, JUST "sandwich"
// ASSUMING THE SPECIFIC IMAGE YOU WANT TO MATCH IS CALLED sandwich.jpg
// YOU CAN USE THE imageExistsInFolder(...) FUNCTION TO RETURN A BOOLEAN FLAG OF true OR false
// DEPENDING ON IF IT DOES OR NOT.
var_dump($arrImgFiles);
var_dump( imageExistsInFolder($arrImgFiles, $searchedImage) );
I have been working on some project and through time it got messed up with images which I tested it, so now I want to make a script which is going to search in articles img tags and find the img name (artiles are stored in mysql with attribute 'text') after scanning the folder where images are stored if they are not in any article included then to delete those images (unused images). Has anyone done this before so I could see an example or any good approach about this case?
Here's what you'll need to do what you want:
Loop through your directory of files (if they are on the filesystem):
if ($handle = opendir('/path/to/files')) {
echo "Directory handle: $handle\n";
echo "Entries:\n";
/* This is the correct way to loop over the directory. */
while (false !== ($entry = readdir($handle))) {
echo "$entry\n";
}
/* This is the WRONG way to loop over the directory. */
while ($entry = readdir($handle)) {
echo "$entry\n";
}
closedir($handle);
}
Ref. http://php.net/readdir
Loop through your files (if they are on the database):
Ref. http://www.php.net/manual/en/mysqli.query.php
Compare file names (obvious once you are looping through your resource).
Delete unused images like so http://www.php.net/unlink
Approach is simple
Query database and get list of all image URLs - add to an array
Loop through each folder that contains images and make an array of every image on the site/
here is how to find all items that are in one array but not another (may be a better answer more specific to you - array_Intesect is what you need.
with the new array simply loop through the list and delete the files.
All of the above you can search individually and then string them together.
I would recommend backing everything up before trying!!!!
I recently came accross such thing where I wanted to remove unused files that users left behind / change the profile picture but they were stored on the webserver. To fix this I used this :
$images = scandir("uploads", 1);
foreach ($images as $itemlc)
{
$res=mysql_query("SELECT * FROM company WHERE c_logo='$itemlc'");
$count = mysql_num_rows($res);
$res2=mysql_query("SELECT * FROM users WHERE u_logo='$itemlc'");
$count2 = mysql_num_rows($res2);
if($count == 1)
{
echo $itemlc; echo " exists <br><br>";
}
else if ($count2 == 1)
{
echo $itemlc; echo " exists <br><br>";
}
else{ $file_path = 'uploads/'; $src=$file_path.$itemlc; #unlink($src); }
}
Hope this helps if there is someone who needs this!
I'd like to loop through images and thumbnails from a folder and insert them into a database.
I want to use is_dir to filter out directories.
I have:
$images = scandir('./images/all_comics/');
$thumbs = scandir('./images/thumbnails/');
for($x=0; $x<count($images); $x++)
{
if(!is_dir($images[$x]))
{
//This shows all images WITHOUT directories
echo $images[$x];
//This is STILL adding images AND directories to database
mysql_query("INSERT INTO images (imgpath, thumbpath) VALUES ('$images[$x]', '$thumbs[$x]')");
}
}
I have a check in there directly after !is_dir, echo $images[$x] ,which echos out all images without the directories, as desired.
But when I check the insert in the database, I see that the directories have been added as records. Why is this?
Thank you!
(Deleting old answer, as the issue was a typo)
scandir returns a list of files in a given directory. When you use is_dir, it's looking in the current directory for those files. I think what you need to do is:
if(!is_dir("./images/all_comics/" . $images[$x])) {
....
Your echo is executed inside if, but query does not:
for($x=0; $x<count($images); $x++)
{
if(!is_dir($images[$x]))
{
echo $images[$x]; //This shows all images WITHOUT directories
mysql_query("INSERT INTO images (imgpath, thumbpath) VALUES ('$images[$x]', '$thumbs[$x]')");
}
}
Also, get rid of mysql_* for PDO, and consider glob as a way to browse for files excluding directories.
You can also use glob
$files = glob('./images/all_comics/*');
foreach ($files as $file) {
if (!is_dir($file)) {
//Do Insert
}
}
I have a directory containing sub directories which each contain a series of files. I'm looking for a script that will look inside the sub directories and randomly return a specified number of files.
There are a few scripts that can search a single directories (not sub folders), and other scripts that can search sub folders but only return one file.
To put a little context on the situation, the returned files will be included as li's in an rotating banner.
Thanks in advance for any help, hopefully this is possible.
I think I've got there, not exactly what I set out to achieve but works good enough, arguably better for the purpose, I'm using the following function:
<?php function RandomFile($folder='', $extensions='.*'){
// fix path:
$folder = trim($folder);
$folder = ($folder == '') ? './' : $folder;
// check folder:
if (!is_dir($folder)){ die('invalid folder given!'); }
// create files array
$files = array();
// open directory
if ($dir = #opendir($folder)){
// go trough all files:
while($file = readdir($dir)){
if (!preg_match('/^\.+$/', $file) and
preg_match('/\.('.$extensions.')$/', $file)){
// feed the array:
$files[] = $file;
}
}
// close directory
closedir($dir);
}
else {
die('Could not open the folder "'.$folder.'"');
}
if (count($files) == 0){
die('No files where found :-(');
}
// seed random function:
mt_srand((double)microtime()*1000000);
// get an random index:
$rand = mt_rand(0, count($files)-1);
// check again:
if (!isset($files[$rand])){
die('Array index was not found! very strange!');
}
// return the random file:
return $folder . "/" . $files[$rand];
}
$random1 = RandomFile('project-banners/website-design');
while (!$random2 || $random2 == $random1) {
$random2 = RandomFile('project-banners/logo-design');
}
while (!$random3 || $random3 == $random1 || $random3 == $random2) {
$random3 = RandomFile('project-banners/design-for-print');
}
?>
And echoing the results into the container (in this case the ul):
<?php include($random1) ;?>
<?php include($random2) ;?>
<?php include($random3) ;?>
Thanks to quickshiftin for his help, however it was a little above my skill level.
For info the original script which I changed an be found at:
http://randaclay.com/tips-tools/multiple-random-image-php-script/
Scrubbing the filesystem every single time to randomly select a file to display will be really slow. You should index the directory structure ahead of time. You can do this many ways, try a simple find command or if you really want to use PHP my favorite choice would be RecursiveDirectoryIterator plus RecursiveIteratorIterator.
Put all the results into one file and just read from there when you select a file to display. You can use the line numbers as an index, and the rand function to pick a line and thus a file to display. You might want to consider something more evenly distributed than rand though, you know to keep the advertisers happy :)
EDIT:
Adding a simple real-world example:
// define the location of the portfolio directory
define('PORTFOLIO_ROOT', '/Users/quickshiftin/junk-php');
// and a place where we'll store the index
define('FILE_INDEX', '/tmp/porfolio-map.txt');
// if the index doesn't exist, build it
// (this doesn't take into account changes to the portfolio files)
if(!file_exists(FILE_INDEX))
shell_exec('find ' . PORTFOLIO_ROOT . ' > ' . FILE_INDEX);
// read the index into memory (very slow but easy way to do this)
$aIndex = file(FILE_INDEX);
// randomly select an index
$iIndex = rand(0, count($aIndex) - 1);
// spit out the filename
var_dump(trim($aIndex[$iIndex]));