PHP find duplicate files by comparing hash, and delete any duplicate files - php

I'm looping through files in a directory and saving name and hash in array like so:
$localfileslist = [];
$localfiles = glob($_GET['name'].'/*');
foreach($localfiles as $localfile){
if(is_file($localfile)){
$localfilehash= hash_file('sha256', $localfile);
array_push($localfileslist,$localfile, $localfilehash);
}
}
$uniques= array_unique($localfileslist);
$dupes=array_diff_assoc($localfileslist,$aunique);
print_r($result);
Now, I'm stumped on how to proceed and finding dupes and deleting them; any help is appreciated

Basically I first store each hash and name together. Then I group that array on the hash, pushing all same hashed names to each group. Finally we can filter that grouped array only having the items with more than 1 names.
$localfileslist = [];
$localfiles = glob(DEFINE_PATH . '/*');
foreach ($localfiles as $localfile) {
if (is_file($localfile)) {
$localfilehash = hash_file('sha256', $localfile);
array_push($localfileslist, ['name' => $localfile, 'hash' => $localfilehash]);
}
}
$grouped = array_reduce($localfileslist, function ($agg, $item) {
if (!isset($agg[$item['hash']])) {
$agg[$item['hash']] = [];
}
$agg[$item['hash']][] = $item["name"];
return $agg;
}, []);
$duplicates = array_filter($grouped, function ($item) {
return count($item) > 1;
});
print_r($duplicates);

Related

Collect duplicate data in array using php

this is my sample array data from bio-metrics
I just want to collect data that has the same bio_id and date
temp:[
0:{
bio_id:"1"
date:"2017-10-05"
date_time:"2017-10-05 08:00:22"
device_name:"biometrics"
time:"08:00:22"
}
1:{
bio_id:"1"
date:"2017-10-05"
date_time:"2017-10-05 08:00:23"
device_name:"biometrics"
time:"08:00:23"
}
2:{
bio_id:"2"
date:"2017-10-05"
date_time:"2017-10-05 08:06:29"
device_name:"biometrics"
time:"08:06:29"
}
3:{
bio_id:"1"
date:"2017-10-05"
date_time:"2017-10-05 15:06:47"
device_name:"biometrics"
time:"15:06:47"
}
4:{
bio_id:"2"
date:"2017-10-05"
date_time:"2017-10-05 16:01:50"
device_name:"biometrics"
time:"16:01:50"
}
]
I been stuck with this code that I made, and don't know how I should manipulate it, or how I will store it properly, I have try some array function but it gives different result to my data
$len = count($temp);
for ($i=0; $i <$len ; $i++) {
$id = $temp[$i]['bio_id'];
$date = $temp[$i]['date'];
for ($x=0; $x < $len; $x++) {
if ($id == $temp[$x]['bio_id'] && $date == $temp[$x]['date']) {
$data[] = $temp[$x];
$int[] = $x;
}
}
}
I don't know how I should manipulate it, or how I will store it properly, I have try some array function but it gives different result to my data
This code will work to collect duplicate in the array on the basis of id and date
$newTemp = array();
foreach($temp as $value){
$newTemp[$value['id'].'_'.$value['date']][] = $value;
}
$newTemp = array();
for($temp as $value){
$key = $value->id." ".$value->date;
if(isset($newTemp[$key])){
$newTemp[$key] = array_merge($newTemp[$key],$value);
}else{
$newTemp[$key] = $value;
}
}
I just want to collect data that has the same bio_id and date
The easiest way is to iterate over the input array and aggregate the data into a new array, indexed by key generated using the bio_id and date fields. This way, a duplicate entry can be easily identified because the key already exists in the output array.
$input = array(/* the big input array here */);
// Build the output here
$output = array();
foreach ($input as $item) {
$key = $item['bio_id'].':'.$item['date'];
if (array_key_exists($key, $output)) {
// This is a duplicate
// Ignore the data, update only the count
$output[$key]['count'] ++;
} else {
// This is the first time this combination is processed
// Copy the input
$output[$key] = $item;
// Keep in 'count' how many times this combination exists in the input
$output[$key]['count'] = 1;
}
}
Each entry of $output is the first entry of $input that has the same combination of bio_id and date. Additional, the value of count is the number of entries of $input that share that pair of bio_id and date.
Work on this example if you need to aggregate the data in a different way (keep all duplicates, instead of their number, f.e.).
Another example that keeps the duplicates:
// Build the output here
$output = array();
foreach ($input as $item) {
$key = $item['bio_id'].':'.$item['date'];
if (array_key_exists($key, $output)) {
// This is a duplicate; add it to the list
$output[$key]['all'][] = $item;
} else {
// This is the first time this combination is processed
// Copy important values (bio_id and date) from the input
$output[$key] = array(
'bio_id' => $item['bio_id'],
'date' => $item['date'],
// Store the entire $item into a list
'all' => array($item),
);
}
}
Read about PHP arrays how to access their elements using the square brackets syntax and how to create or modify their values.

Incrementing the value in multidimensional array in php

I couldn't understand the multidimensional array in PHP properly. I have a CSV file having two columns as shown below:
I am trying to create an array of array, in which each key is a cataegory. However, the value of each key is an array. In this array, each key is company and value is the count of the product. See below the code:
<?php
//array contains value
function contains_value($my_array, $value_search){
foreach ($my_array as $key => $value) {
if ($value === $value_search)
return true;
}
return false;
}
//array contains key
function contains_key($my_array, $key_search){
foreach ($my_array as $key => $value) {
if ($key === $key_search)
return true;
}
return false;
}
$handle = fopen("product_list.csv", "r");
$products = array();
if ($handle) {
while (($line = fgets($handle)) !== false) {
$product = explode(",", $line);
$category = $product[0];
$company = $product[1];
if (contains_key($products, $category)) {
if (contains_value($products, $company)) {
//increase the count of category by 1
$products[$category][$company] = $products[$category][$company] + 1;
} else {
//append new company with count 1
array_push($products[$category], array(
$company,
1
));
}
} else {
//initialize new company with count 1
$products[$category] = array(
$company,
1
);
}
}
fclose($handle);
}
var_dump($products);
?>
I noticed that the var_dump($products) is not showing correction information. I am expecting following kind of result:
I haven't enough reputation to reply, but I think he need counts.
To complete the answer of Alive to Die, more something like this:
if (!array_key_exists($category, $products)) {
products[$category] = [];
}
if (!array_key_exists($company, $products[$category])) {
products[$category][$company] = 0;
}
++$results[$cataegory][$company];
But cleaner ;)
Edit:
If I remember well, his first idea was this:
$products[$category][] = $company;
The code is shorter. Maybe you can combine the two ideas.

How to find similarity between IDs string and filenames?

I have a list of IDs
$Ids="1201,1240,1511,1631,1663,1666,1716,2067,2095";
and in the /imgs/ folder there are many jpg filenames related to these IDs. But there are a lot of IDs that do not have any image.
for example there are in the /imgs/
1201_73.jpg
1201_2897.jpg
1240-9834.jpg
1240-24.jpg
1511-dsc984.jpg
1511-dsc34.jpg
What I want to achieve is to find which of the IDs have images in the img folder.
Thank you
Updated
$array = array();
$foo = explode('.jpg', $images);
foreach($foo as $id) {
$digi = substr(trim($id), 0,4);
if(!in_array($digi, $array)) {
array_push($array, $digi);
echo $id . ".jpg <br/>";
$where .= "id='$digi' or ";
}
}
First, turn your string of IDs into an array.
$idsArray = explode(',', $Ids);
Now iterate through the directory, checking each file to see if it starts with the ID.
$hasImages = array();
foreach (new DirectoryIterator(__DIR__ . '/imgs') as $fileInfo) {
if ($fileInfo->isDot() || $fileInfo->isDir()) {
continue;
}
foreach ($idsArray as $id) {
if (0 === strpos($fileInfo->getBasename(), $id)) {
$hasImages[] = $id;
break;
}
}
}
$hasImages = array_unique($hasImages);
$hasImages will contain an array of IDs which have an image.
Something like this should work:
$files = glob('/imgPath/*.jpg');
$hasImage = array_unique(array_map(function($file) {
return explode('-', $file)[0];
}, $files));
$withimages= array_diff(explode($Ids), $hasImage);

How to reference cells with specific condition in php multidimensional array

I got an array like this:
$array[0][name] = "Axel";
$array[0][car] = "Pratzner";
$array[0][color] = "black";
$array[1][name] = "John";
$array[1][car] = "BMW";
$array[1][color] = "black";
$array[2][name] = "Peggy";
$array[2][car] = "VW";
$array[2][color] = "white";
I would like to do something like "get all names WHERE car = bmw AND color = white"
Could anyone give advice on how the PHP spell would look like?
function getWhiteBMWs($array) {
$result = array();
foreach ($array as $entry) {
if ($entry['car'] == 'bmw' && $entry['color'] == 'white')
$result[] = $entry;
}
return $result;
}
Edited: This is a more general solution:
// Filter an array using the given filter array
function multiFilter($array, $filters) {
$result = $array;
// Removes entries that don't pass the filter
$fn = function($entry, $index, $filter) {
$key = $filter['key'];
$value = $filter['value'];
$result = &$filter['array'];
if ($entry[$key] != $value)
unset($result[$index]);
};
foreach ($filters as $key => $value) {
// Pack the filter data to be passed into array_walk
$filter = array('key' => $key, 'value' => $value, 'array' => &$result);
// For every entry, run the function $fn and pass in the filter data
array_walk($result, $fn, $filter);
}
return array_values($result);
}
// Build a filter array - an entry passes this filter if every
// key in this array corresponds to the same value in the entry.
$filter = array('car' => 'BMW', 'color' => 'white');
// multiFilter searches $array, returning a result array that contains
// only the entries that pass the filter. In this case, only entries
// where $entry['car'] = 'BMW' AND $entry['color'] = 'white' will be
// returned.
$whiteBMWs = multiFilter($array, $filter);
Doing this in code is more or less emulating what a RDBMS is perfect for. Something like this would work:
function getNamesByCarAndColor($array,$color,$car) {
$matches = array();
foreach ($array as $entry) {
if($entry["color"]== $color && $entry["car"]==$car)
matches[] = $entry["name"];
}
return $matches;
}
This code would work well for smaller arrays, but as they got larger and larger it would be obvious that this isn't a great solution and an indexed solution would be much cleaner.

PHP Can't get the right format for array

I got stuck somehow on the following problem:
What I want to achieve is to merge the following arrays based on key :
{"Entities":{"submenu_id":"Parents","submenu_label":"parents"}}
{"Entities":{"submenu_id":"Insurers","submenu_label":"insurers"}}
{"Users":{"submenu_id":"New roles","submenu_label":"newrole"}}
{"Users":{"submenu_id":"User - roles","submenu_label":"user_roles"}}
{"Users":{"submenu_id":"Roles - permissions","submenu_label":"roles_permissions"}}
{"Accounting":{"submenu_id":"Input accounting data","submenu_label":"new_accounting"}}
Which needs to output like this:
[{"item_header":"Entities"},
{"list_items" :
[{"submenu_id":"Parents","submenu_label":"parents"},
{"submenu_id":"Insurers","submenu_label":"insurers"}]
}]
[{"item_header":"Users"},
{"list_items" :
[{"submenu_id":"New roles","submenu_label":"newrole"}
{"submenu_id":"User - roles","submenu_label":"user_roles"}
{"submenu_id":"Roles - permissions","submenu_label":"roles_permissions"}]
}]
[{"item_header":"Accounting"},
{"list_items" :
[{"submenu_id":"Input accounting data","submenu_label":"new_accounting"}]
}]
I have been trying all kinds of things for the last two hours, but each attempt returned a different format as the one required and thus failed miserably. Somehow, I couldn't figure it out.
Do you have a construction in mind to get this job done?
I would be very interested to hear your approach on the matter.
Thanks.
$input = array(
'{"Entities":{"submenu_id":"Parents","submenu_label":"parents"}}',
'{"Entities":{"submenu_id":"Insurers","submenu_label":"insurers"}}',
'{"Users":{"submenu_id":"New roles","submenu_label":"newrole"}}',
'{"Users":{"submenu_id":"User - roles","submenu_label":"user_roles"}}',
'{"Users":{"submenu_id":"Roles - permissions","submenu_label":"roles_permissions"}}',
'{"Accounting":{"submenu_id":"Input accounting data","submenu_label":"new_accounting"}}',
);
$input = array_map(function ($e) { return json_decode($e, true); }, $input);
$result = array();
$indexMap = array();
foreach ($input as $index => $values) {
foreach ($values as $k => $value) {
$index = isset($indexMap[$k]) ? $indexMap[$k] : $index;
if (!isset($result[$index]['item_header'])) {
$result[$index]['item_header'] = $k;
$indexMap[$k] = $index;
}
$result[$index]['list_items'][] = $value;
}
}
echo json_encode($result);
Here you are!
In this case, first I added all arrays into one array for processing.
I thought they are in same array first, but now I realize they aren't.
Just make an empty $array=[] then and then add them all in $array[]=$a1, $array[]=$a2, etc...
$array = '[{"Entities":{"submenu_id":"Parents","submenu_label":"parents"}},
{"Entities":{"submenu_id":"Insurers","submenu_label":"insurers"}},
{"Users":{"submenu_id":"New roles","submenu_label":"newrole"}},
{"Users":{"submenu_id":"User - roles","submenu_label":"user_roles"}},
{"Users":{"submenu_id":"Roles - permissions","submenu_label":"roles_permissions"}},
{"Accounting":{"submenu_id":"Input accounting data","submenu_label":"new_accounting"}}]';
$array = json_decode($array, true);
$intermediate = []; // 1st step
foreach($array as $a)
{
$keys = array_keys($a);
$key = $keys[0]; // say, "Entities" or "Users"
$intermediate[$key] []= $a[$key];
}
$result = []; // 2nd step
foreach($intermediate as $key=>$a)
{
$entry = ["item_header" => $key, "list_items" => [] ];
foreach($a as $item) $entry["list_items"] []= $item;
$result []= $entry;
}
print_r($result);
I would prefer an OO approach for that.
First an object for the list_item:
{"submenu_id":"Parents","submenu_label":"parents"}
Second an object for the item_header:
{"item_header":"Entities", "list_items" : <array of list_item> }
Last an object or an array for all:
{ "Menus: <array of item_header> }
And the according getter/setter etc.
The following code will give you the requisite array over which you can iterate to get the desired output.
$final_array = array();
foreach($array as $value) { //assuming that the original arrays are stored inside another array. You can replace the iterator over the array to an iterator over input from file
$key = /*Extract the key from the string ($value)*/
$existing_array_for_key = $final_array[$key];
if(!array_key_exists ($key , $final_array)) {
$existing_array_for_key = array();
}
$existing_array_for_key[count($existing_array_for_key)+1] = /*Extract value from the String ($value)*/
$final_array[$key] = $existing_array_for_key;
}

Categories