How do I detect groups of common strings in filenames

How do I detect groups of common strings in filenames - php

I'm trying to figure out a way to detect groups of files. For instance:
If a given directory has the following files:
Birthday001.jpg
Birthday002.jpg
Birthday003.jpg
Picknic1.jpg
Picknic2.jpg
Afternoon.jpg.
I would like to condense the listing to something like
Birthday ( 3 pictures )
Picknic ( 2 pictures )
Afternoon ( 1 picture )
How should I go about detecting the groups?

Here's one way you can solve this, which is more efficient than a brute force method.
load all the names into an associative array with key equal to the name and value equal to the name but with digits stripped (preg_replace('/\d//g', $key)).
You will have something like $arr1 = [Birthday001 => Birthday, Birthday002 => Birthday ...]
now make another associative array with keys that are values from the first array and value which is a count. Increment the count when you've already seen the key.
in the end you will end up with a 2nd array that contains the names and counts, just like you wanted. Something like $arr2 = [Birthday => 2, ...]

Simply build a histogram whose keys are modified by a regex:
<?php
# input
$filenames = array("Birthday001.jpg", "Birthday002.jpg", "Birthday003.jpg", "Picknic1.jpg", "Picknic2.jpg", "Afternoon.jpg");
# create histogram
$histogram = array();
foreach ($filenames as $filename) {
$name = preg_replace('/\d+\.[^.]*$/', '', $filename);
if (isset($histogram[$name])) {
$histogram[$name]++;
} else {
$histogram[$name] = 1;
}
}
# output
foreach ($histogram as $name => $count) {
if ($count == 1) {
echo "$name ($count picture)\n";
} else {
echo "$name ($count pictures)\n";
}
}
?>

Generate an array of words like "my" (developing this array will be very important, "my" is the only one in your example given) and strip these out of all the file names. Strip out all numbers and punctuation, also extensions should be long gone at this point. Once this is done, put all of the unique results into an array. You can then use this as a fairly reliable source of keywords to search for any stragglers that the other processing didn't catch.

Related

Find index of value in associative array in php?

If you have any array $p that you populated in a loop like so:
$p[] = array( "id"=>$id, "Name"=>$name);
What's the fastest way to search for John in the Name key, and if found, return the $p index? Is there a way other than looping through $p?
I have up to 5000 names to find in $p, and $p can also potentially contain 5000 rows. Currently I loop through $p looking for each name, and if found, parse it (and add it to another array), splice the row out of $p, and break 1, ready to start searching for the next of the 5000 names.
I was wondering if there if a faster way to get the index rather than looping through $p eg an isset type way?
Thanks for taking a look guys.

Okay so as I see this problem, you have unique ids, but the names may not be unique.
You could initialize the array as:
array($id=>$name);
And your searches can be like:
array_search($name,$arr);
This will work very well as native method of finding a needle in a haystack will have a better implementation than your own implementation.
e.g.
$id = 2;
$name= 'Sunny';
$arr = array($id=>$name);
echo array_search($name,$arr);
Echoes 2
The major advantage in this method would be code readability.

If you know that you are going to need to perform many of these types of search within the same request then you can create an index array from them. This will loop through the array once per index you need to create.
$piName = array();
foreach ($p as $k=>$v)
{
$piName[$v['Name']] = $k;
}
If you only need to perform one or two searches per page then consider moving the array into an external database, and creating the index there.

$index = 0;
$search_for = 'John';
$result = array_reduce($p, function($r, $v) use (&$index, $search_for) {
if($v['Name'] == $search_for) {
$r[] = $index;
}
++$index;
return $r;
});
$result will contain all the indices of elements in $p where the element with key Name had the value John. (This of course only works for an array that is indexed numerically beginning with 0 and has no “holes” in the index.)
Edit: Possibly even easier to just use array_filter, but that will not return the indices only, but all array element where Name equals John – but indices will be preserved:
$result2 = array_filter($p, function($elem) {
return $elem["Name"] == "John" ? true : false;
});
var_dump($result2);
What suits your needs better, resp. which one is maybe faster, is for you to figure out.

PHP search array using wildcard?

Assume that i have the following arrays containing:
Array (
[0] => 099/3274-6974
[1] => 099/12-365898
[2] => 001/323-9139
[3] => 002/3274-6974
[4] => 000/3623-8888
[5] => 001/323-9139
[6] => www.somesite.com
)
Where:
Values that starts with 000/, 002/ and 001/ represents mobile (cell) phone numbers
Values that starts with 099/ represents telephone (fixed) numbers
Vales that starts with www. represents web sites
I need to convert given array into 3 new arrays, each containing proper information, like arrayTelephone, arrayMobile, arraySite.
Function in_array works only if i know whole value of key in the given array, which is not my case.

Create the three empty arrays, loop through the source array with foreach, inspect each value (regexp is nice for this) and add the items to their respective arrays.

Loop through all the items and sort them into the appropriate arrays based on the first 4 characters.
$arrayTelephone = array();
$arrayMobile = array();
$arraySite = array();
foreach($data as $item) {
switch(substr($item, 0, 4)) {
case '000/':
case '001/':
case '002/':
$arrayMobile[] = $item;
break;
case '099/':
$arrayTelephone[] = $item;
break;
case 'www.':
$arraySite[] = $item;
break;
}
}

You can loop over the array and push the value to the correct new array based on your criteria. Example:
<?php
$fixed_array = array();
foreach ($data_array as $data) {
if (strpos($data, '099') === 0) {
$fixed_array[] = $data;
}
if ....
}

Yes i actually wrote the full code with preg_match but after reading some comments i accept that its better to show the way.
You will create three different arrays named arrayTelephone, arrayMobile, arraySite.
than you will search though your first array with foreach or for loop. Compare your current loop value with your criteria and push the value to one of the convenient new arrays (arrayTelephone, arrayMobile, arraySite) after pushing just continue your loop with "continue" statement.
You can find the solution by looking add the Perfect PHP Guide

Parsing only specific values from an array?

I'm parsing some information using Xpath and it returns me a simple array.
$values = array();
Array
(
[0] => http://www.aaa.com/19364328526/
[1] => http://www.bbb.com/207341152011/
[2] => http://www.ccc.co.jp/1246623/
)
Is there any way I can parse through the array and only take certain URLs based on URL weighting? For example. If aaa.com exists, take only aaa.com. If not, check for ccc.co.jp, if that exists, take that only, etc.
I only know how to select from arrays when I know what is there $values[0]/[1]/etc, unfortunately the order of links in this array change and/or aren't present sometimes.
Any help would be much appreciated!
Thanks!
Tre

You can use in_array() to check if a value exists. I don't know exactly what you are trying to do, but here is an example. Do you know all the possible values that you might get back?
//List domains in priority order
$weighted = array('aaa.com','bbb.com','ccc.com');
$selected_url = '';
foreach($weighted as $check) { //start with highest priority
foreach($values as $url) { //loop through all URL's
if(strpos($url,$check) !== false) {
//If a url matches priority, return it. We are finished to exit both loops
$selected_url = $url;
break 2;
}
}
}
$selected_url should have the highest priority URL, or it will be empty if none of the urls were found.

Compare PHP arrays

I know there are a lot of these, but I'm looking for something slightly different.
A straight diff won't work for me.
I have a list (array) of allowed tags i.e. ["engine","chassis","brakes","suspension"]
Which I want to check with the list the user has entered. Diff won't work, because the user may not enter all the options i.e. ["engine"] but I still want this to pass. What I want to happen is fail if they put something like "banana" in the list.

You can use array_intersect(), and check the size of the resulting array with the size of the input array. If the result is smaller, then the input contains one or more items not in the 'allowed' array. If its size is equal, all items in it are in the user's input, so you can use the array do do whatever you want.

Use array_diff();
$allowed=array("engine","chassis","brakes","suspension");
$user=array("engine","brakes","banana");
$unallowed=array_diff($user, $allowed);
print_r($unallowed);
This will return banana, as it is in $user, but not in $allowed.

array_diff(): http://nl.php.net/array_diff
Returns an array containing all the entries from array1 that are not present in any of the other arrays.
if ( array_diff( $input, $allowed ) ) {
// error
}

$allowed_tags = array("engine","chassis","brakes","suspension");
$user_iput = array("engine", "suspension", "banana");
foreach($user_input as $ui) {
if(!in_array($ui, $allowed_tags)) {
//user entered an invalid tag
}
}

PHP multi-dimensional array find duplicates in specific dimensions

I have the following array:
$masterlist=[$companies][$fieldsofcompany][0][$number]
The third dimension only exists if the field selected from $fieldsofcompany = position 2 which contains the numbers array. Other positions contain regular variables. The 3rd dimension is always 0 (the numbers array) or Null. Position 4 contains numbers.
I want to cycle through all companies and remove from the $masterlist all companies which contain duplicate numbers.
My current implementation is this code:
for($i=0;$i<count($masterlist);$i++)
{
if($masterlist[$i][2][0][0] != null)
$id = $masterlist[$i][0];
for($j=0;$j<count($masterlist[$i][2][0]);$j++)
{
$number = $masterlist[$i][2][0][$j];
$query = "INSERT INTO numbers VALUES('$id','$number')";
mysql_query($query);
}
}
Which inserts numbers and associated IDs into a table. I then select unique numbers like so:
SELECT ID,number
FROM numbers
GROUP BY number
HAVING (COUNT(number)=1)
This strikes me as incredibly brain-dead. My question is what is the best way to do this? I'm not looking for code per se, but approaches to the problem. For those of you who have read this far, thank you.

For starters, you should prune the data before sticking it into the database.
Keep a look up table that keeps track of the 'number'.
If the number is not in the look up table then use it and mark it, otherwise if its in the look up table you can ignore it.
Using an array for the look up table and with keys being the 'number' you can use the isset function to test if the number has appeared before or not.
Example pseudo code:
if(!isset($lookupTable[$number])){
$lookupTable[$number]=1;
//...Insert into database...
}

Now that I think I understand what you really want, you might want to stick with your two-pass approach but skip the MySQL detour.
In the first pass, gather numbers and duplicate companies:
$duplicate_companies = array();
$number_map = array();
foreach ($masterlist as $index => $company)
{
if ($company[2][0][0] === null)
continue;
foreach ($company[2][0] as $number)
{
if (!isset($number_map[$number])
{
// We have not seen this number before, associate it
// with the first company index.
$number_map[$number] = $index;
}
else
{
// Both the current company and the one with the index stored
// in $number_map[$number] are duplicates.
$duplicate_companies[] = $index;
$duplicate_companies[] = $number_map[$number];
}
}
}
In the second pass, remove the duplicates we have found from the master list:
foreach (array_unique($duplicate_companies) as $index)
{
unset($masterlist[$index]);
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How do I detect groups of common strings in filenames - php

Related

Find index of value in associative array in php?

PHP search array using wildcard?

Parsing only specific values from an array?

Compare PHP arrays

PHP multi-dimensional array find duplicates in specific dimensions

Categories

Resources