Php RecursiveIteratorIterator through tarball keeping original file order - php

In a tarball files are sorted by the time when they're pushed in. In the following example they are not sorted alphabetically, but in push-order.
# tar tvf file.tgz
-rw-r--r-- 0/0 962 2023-01-17 13:40:17 6fe3b8b5-a4dc-4976-bea9-227434c11cda
-rw-r--r-- 0/0 962 2023-01-17 13:40:17 febe009e-8ce0-4027-abda-e27f5f949dde
-rw-r--r-- 0/0 962 2023-01-17 13:40:17 4fea07a6-a90c-4cb4-9969-4cac08422ccf
-rw-r--r-- 0/0 962 2023-01-17 13:40:17 01610297-c577-4e6c-9ce9-7825a40e6d0c
-rw-r--r-- 0/0 962 2023-01-17 13:40:26 01b498ca-3d87-426b-b01d-d4e75921cb00
-rw-r--r-- 0/0 962 2023-01-17 13:40:26 45821111-f69d-4331-87d8-f3afd9918c91
Surprisingly when reading the above tarball with PHP 8.1 RecursiveIteratorIterator they are iterated alphabetically.
$p = new PharData($path);
foreach (new RecursiveIteratorIterator($p) as $file) {
echo $file . "\n";
}
The above code results in the output below:
phar:///path/to/file/01610297-c577-4e6c-9ce9-7825a40e6d0c
phar:///path/to/file/01b498ca-3d87-426b-b01d-d4e75921cb00
phar:///path/to/file/45821111-f69d-4331-87d8-f3afd9918c91
phar:///path/to/file/4fea07a6-a90c-4cb4-9969-4cac08422ccf
phar:///path/to/file/6fe3b8b5-a4dc-4976-bea9-227434c11cda
phar:///path/to/file/febe009e-8ce0-4027-abda-e27f5f949dde
Is there a way in PHP to iterate over tarball content in the original order?

I believe this is caused by PharData class, not RecursiveIteratorIterator. You can force your own sort by converting to an array first, then using usort() on whatever field you want:
$p = new PharData(...);
$files = iterator_to_array($p);
usort($files, fn($a, $b) => $a->getMTime() <=> $b->getMTime());
foreach ($files as $file) {
printf("%s %s\n", $file->getPathname(), $file->getMTime());
}

Thanks to Alex Howansky answer I found a working solution based on exec(), that basically take advantage of tar sorted listing:
$sortedFileList = [];
exec("tar tf $path", $sortedFileList);
$sortedFileList = array_flip($sortedFileList);
$p = new PharData($path);
$files = iterator_to_array($p);
usort($files,
fn(a, $b) => $sortedFileList[$a->getFilename()] <=> $sortedFileList[$b->getFilename()]
);
foreach ($files as $file) {
printf("%s %s\n", $file->getPathname(), $file->getMTime());
}
I'm not a big fan of using exec, but the only alternative to achieve my goal would be to add some external library to the project, and since this work should be done by plain PharData iterator I hope that in a future it will permit to iterate tarball files in raw order.

Related

How to group files into groups of some size in PHP

Imagine you have 10 files which I need to send, for example:
$files of type string[]
1.jpg - 3.2MB
2.jpg - 2.8MB
3.jpg - 3.5MB
4.jpg - 2.1MB
5.jpg - 0.9MB
6.jpg - 2.9MB
7.jpg - 2.4MB
8.jpg - 2.1MB
9.jpg - 1.1MB
10.jpg - 1.9MB
Also I have for example 10MB limit per one e-mail. $limit
I need to create groups of files (arrays or filenames) where each group has filenames of files which filesize sum is lower than $limit
Example:
['1.jpg', '2.jpg', '3.jpg', '4.jpg'] - WRONG (11.6MB > $limit)
['1.jpg', '2.jpg', '3.jpg'] - OK (9.5MB < $limit)
Even better to has as few groups as possible (for example you can have 10 groups with only one file which would make this task done, but it would be better to has only few groups with more files as close to $limit as possible)
I need to do this with pure PHP or with a help of Laravel if needed
Thanks
UPDATE:
I know how to get file size, I need to group those files
We can read that to find as few groups as possible is a combinatorial NP-hard problem, but with an algorithm such as First Fit Decreasing we get results that should be good enough. Here's an example implementation:
$files = array('1.jpg', '2.jpg', '3.jpg', '4.jpg', '5.jpg', '6.jpg', '7.jpg', '8.jpg', '9.jpg', '10.jpg');
$sizes = array(3.2, 2.8, 3.5, 2.1, 0.9, 2.9, 2.4, 2.1, 1.1, 1.9);
$limit = 10;
$files = array_combine($files, $sizes);
# algorithm "First Fit Decreasing"
arsort($files); # sort the files by descending size
$groups = array();
foreach ($files as $file => $size)
{ # insert the files one by one
# so that each is placed in the first group that still has enough space
foreach ($groups as &$group)
if (array_sum($group)+$size <= $limit)
{ $group[$file] = $size; continue 2; }
# if there is not enough space in any of the groups that are already open,
# open a new one
$groups[][$file] = $size;
}
unset($group);
# the filenames are the keys within each group
foreach ($groups as $group) print_r(array_keys($group));
Here you can find methods for file type input: https://laravel.com/api/5.8/Illuminate/Http/Testing/File.html
To know the size of a file you can use this method:
$img_size = $request->file('file')->getSize();
You already have the most important thing, now it will depend on you how to program the solution ;)
I hope I've helped
PD: sorry for my English

Transposing csv values in php

I need to transpose some values in some csv files that we get sent on a regular basis so that they are able to be imported into a website and I'm not sure the best way to go about doing it.
The data arrives in a csv file with a header row containing the column names, and the first column values are product ID's. The data looks like this…
ID F F-VF VF VF-XF XF
1 840 960 1080 1248 1944
2 1137.5 1300 1462.5 1690 2632.5
3 1225 1400 1575 1820 2835
What I'm looking to do is change this around so the column name and it's value are put into a new line for each value for the same id like so…
ID COND VALUE
1 F 840
1 F-VF 960
1 VF 1080
1 VF-XF 1248
1 XF 1944
2 F 1137.5
2 F-VF 1300
2 VF 1462.5
2 VF-XF 1690
2 XF 2835
I may also need to add some strings into the cells - is that easy to do?
Thanks a lot
Not necessarily the most elegant version, just to get an idea.
Something like this would work in case it's a existing csv, which gets read and overwritten with the transposed version.
// assuming a dataset like
// ----
// fruit, apple, pear, peach
// animal, eagle, bear, wolf
// car, renault, fiat, nio
$f = fopen($savePath, 'r');
$header = [];
$data = [];
while($row = fgetcsv($f, 0, ",")) {
$header[]=$row[0];
for ($i = 1; $i < sizeof(array_keys($row)); $i++) {
$data[$i][$row[0]]=$row[$i];
}
}
fclose($f);
$f = fopen($savePath, 'w');
fputcsv($f, $header);
foreach ($data as $recordColDataSet) {
fputcsv($f, array_values($recordColDataSet));
}
fclose($f);
Transposing arrays could also be something to look at eg in this question here:
Transposing multidimensional arrays in PHP
Have you tried any of the standard PHP methods like: str_getcsv() or fgetcsv()?
A quick search of "php csv" provides a TON of possible solutions. I suggest trying a few, then reporting back here if you have a specific problem.

How can I match files by file size and rename accordingly?

I have two directories of images with mismatching names, but mostly matching images.
Dir 1 Size | Dir 2 Size
---------------------------------------------------
img1.jpg 508960 | a_image_name.jpg 1038644
img2.jpg 811430 | another_image_name.jpg 396240
... ... | ... ...
img1000.jpg 602583 | image_name.jpg 811430
... ... |
img2000.jpg 396240 |
The first directory has more images, but is misnamed. The second directory has the correct names, but not corresponding in order to the first directory.
I'd like to rename files in Dir 1 by comparing file size (or some other way) to Dir 2. In the above example img2.jpg would be renamed to image_name.jpg because both have the same file size.
Can you point me in the right direction?
Preferably by way of app (Mac), shell, or php.
Maybe it would be wiser to use hashes of the files instead of using the filesize?
In short: using glob(), get a list of files in dir1, iterate, create md5-hash (md5() + file_get_contents()), store in an array, using the hash as key and the filename as value.
Do the same for dir2.
iterate array1, if an entry with the same hash exists in array2 rename file
Code will be something like this: (untested, unoptimized)
$dir1 = array();
$dir2 = array();
// get hashes for dir1
foreach( glob( '/path/to/dir1/*.jpg' ) as $file ) {
$hash = md5( file_get_contents( $file ) );
$dir1[ $hash ] = $file;
}
// repeat for dir2 ...
foreach( $dir1 as $hash => $file1 ) {
if( array_key_exists( $hash, $dir2 ) ) {
rename( $file1, $dir2[ $hash ] );
}
}
Here is my solution, which rename files in dir1 based on file size.
Contents of dir1:
-rw-r--r-- 1 haiv staff 10 Aug 16 13:18 file1.txt
-rw-r--r-- 1 haiv staff 20 Aug 16 13:18 file2.txt
-rw-r--r-- 1 haiv staff 30 Aug 16 13:18 file3.txt
-rw-r--r-- 1 haiv staff 205 Aug 16 13:18 file4.txt
(Note the fifth column stores the file sizes.) And the contents of dir2:
-rw-r--r-- 1 haiv staff 30 Aug 16 13:18 doc.txt
-rw-r--r-- 1 haiv staff 205 Aug 16 13:18 dopey.txt
-rw-r--r-- 1 haiv staff 20 Aug 16 13:18 grumpy.txt
-rw-r--r-- 1 haiv staff 10 Aug 16 13:18 happy.txt
Create a file call ~/rename.awk (yes, from the home directory, to avoid polluting either dir1 or dir2):
/^total/ {next} # Skip the first line (which contains the total, of ls -l)
{
if (name[$5] == "") {
name[$5] = $NF
print "# File of size", $5, "should be named", $NF
} else {
printf "mv '%s' '%s'\n", $NF, name[$5]
}
}
Now, cd into dir1 (if you want to rename files in dir1), and issue the following command:
$ awk -f ~/rename.awk <(ls -l ../dir2) <(ls -l)
Output:
# File of size 30 should be named doc.txt
# File of size 205 should be named dopey.txt
# File of size 20 should be named grumpy.txt
# File of size 10 should be named happy.txt
mv 'file1.txt' 'happy.txt'
mv 'file2.txt' 'grumpy.txt'
mv 'file3.txt' 'doc.txt'
mv 'file4.txt' 'dopey.txt'
Once you are happy with the result, pipe the above command to sh to execute the changes:
$ awk -f ~/rename.awk <(ls -l ../dir2) <(ls -l) | sh
Notes:
No safeguard against files with the same size. For that, the MD5 solution which wonk0 offered works better.
Please examine the output before you commit. Changes are permanent.

Which is faster: glob() or opendir()

Which is faster between glob() and opendir(), for reading around 1-2K file(s)?
http://code2design.com/forums/glob_vs_opendir
Obviously opendir() should be (and is) quicker as it opens the directory handler and lets you iterate. Because glob() has to parse the first argument it's going to take some more time (plus glob handles recursive directories so it'll scan subdirs, which will add to the execution time.
glob and opendir do different things. glob finds pathnames matching a pattern and returns these in an array, while opendir returns a directory handle only. To get the same results as with glob you have to call additional functions, which you have to take into account when benchmarking, especially if this includes pattern matching.
Bill Karwin has written an article about this recently. See:
http://www.phparch.com/2010/04/28/putting-glob-to-the-test/
Not sure whether that is perfect comparison but glob() allows you to incorporate the shell-like patterns as well where as opendir is directly there for the directories there by making it faster.
another question that can be answered with a bit of testing. i had a convenient folder with 412 things in it, but the results shouldn't vary much, i imagine:
igor47#whisker ~/test $ ls /media/music | wc -l
412
igor47#whisker ~/test $ time php opendir.php
414 files total
real 0m0.023s
user 0m0.000s
sys 0m0.020s
igor47#whisker ~/test $ time php glob.php
411 files total
real 0m0.023s
user 0m0.010s
sys 0m0.010s
Okay,
Long story short:
if you want full filenames+paths, sorted, glob is practically unbeatable.
if you want full filenames+paths unsorted, use glob with GLOB_NOSORT.
if you want only the names, and no sorting, use opendir + loop.
That's it.
Some more thoughts:
You can do tests to compose the exact same result with different methods only to find they have approximately the same time cost. Merely for fetching the information you'll have no real winner. However, consider these:
Dealing with a huge file list, glob will sort faster - it uses the filesystem's sort method which will always be superior. (It knows what it sorts while PHP doesn't, PHP sorts a hashed array of arbitrary strings, it's simply not fair to compare them.)
You'll probably want to filter your list by some extensions or filename masks for which glob is really efficient. You have fnmatch() of course, but calling it every time will never be faster than a system-level filter trained for this very job.
On the other hand, glob returns a significantly bigger amount of text (each name with full path) so with a lot of files you may run into memory allocation limits. For a zillion files, glob is not your friend.
OpenDir is more Faster...
<?php
$path = "/var/Upload/gallery/TEST/";
$filenm = "IMG20200706075415";
function microtime_float()
{
list($usec, $sec) = explode(" ", microtime());
return ((float)$usec + (float)$sec);
}
echo "<br> <i>T1:</i>".$t1 = microtime_float();
echo "<br><br> <b><i>Glob :</i></b>";
foreach( glob($path.$filenm.".*") as $file )
{
echo "<br>".$file;
}
echo "<br> <i>T2:</i> ".$t2 = microtime_float();
echo "<br><br> <b><i>OpenDir :</b></i>";
function resolve($name)
{
// reads informations over the path
$info = pathinfo($name);
if (!empty($info['extension']))
{
// if the file already contains an extension returns it
return $name;
}
$filename = $info['filename'];
$len = strlen($filename);
// open the folder
$dh = opendir($info['dirname']);
if (!$dh)
{
return false;
}
// scan each file in the folder
while (($file = readdir($dh)) !== false)
{
if (strncmp($file, $filename, $len) === 0)
{
if (strlen($name) > $len)
{
// if name contains a directory part
$name = substr($name, 0, strlen($name) - $len) . $file;
}
else
{
// if the name is at the path root
$name = $file;
}
closedir($dh);
return $name;
}
}
// file not found
closedir($dh);
return false;
}
$file = resolve($path.$filenm);
echo "<br>".$file;
echo "<br> <i>T3:</i> ".$t3 = microtime_float();
echo "<br><br>  <b>glob time:</b> ". $gt= ($t2 - $t1) ."<br><b>opendir time:</b>". $ot = ($t3 - $t2) ;
echo "<u>". (( $ot < $gt ) ? "<br><br>OpenDir is ".($gt-$ot)." more Faster" : "<br><br>Glob is ".($ot-$gt)." moreFaster ") . "</u>";
?>
Output:
T1:1620133029.7558
Glob :
/var/Upload/gallery/TEST/IMG20200706075415.jpg
T2: 1620133029.7929
OpenDir :
/var/Upload/gallery/TEST/IMG20200706075415.jpg
T3: 1620133029.793
  glob time:0.037137985229492
opendir time:5.9843063354492E-5
OpenDir is 0.037078142166138 more Faster

rename many files automatic one time - ddmmyyyy.file to yyyymmdd.file

I have 1500 files that are named with an incorrectly dateformat. I would like to rename them. Are there a tool that can do that? Otherwise a piece of php code.
File names are:
ddmmyyyy.xls (e.g. 15012010 for 15.th Jan 2010)
and I would like:
yyyymmdd.xls (e.g. 20100115.xls)
Any clue on how this can be done for 1500 files in one go?
BR. Anders
UPDATE:
Also tried the MP3TAG, that is suggested in one of the answers. It is a free tool and also did the job. It took a while to figure out how to use it. If you wanne try do this:
add xls (or other format) to the list of editable files in configuration
choose folder to load files AND mark files in the pane you want to edit
I clicked the "Convert - Quick" button. It is also possible to save schemaes for future use but I could not figure out how.
after clicking "convert - quick" choose "using regex" (only regex option)
And then you just add the info to process the renaming. In my case:
field: _FILENAME
from: ([0-9]{2})([0-9]{2})([0-9]{4})
to: $3-$2-$1
Now all files named 15012010.xls (ddmmyyyy.xls) will be named 2010-01-15.xls
Here's a start (untested, but you should get the idea).
$files = glob('your/folder/*.xls');
foreach($files as $file) {
preg_match_all('/^(\d{2})(\d{2})(\d{4})\.xls$/', basename($file), $matches);
if ( ! $matches) continue;
$year = $matches[0][3];
$month = $matches[0][2];
$day = $matches[0][1];
$newFilename = $year . $month . $day . '.xls';
rename ( $file, dirname($file) . '/' . $newFilename );
}
If you have a Linux machine with the files... you can use bash to do:
for f in *.xls; do
mv $f "$(echo $f | cut -c4-8)$(echo $f | cut -c3,4)$(echo $f | cut -c1,2).xls"
done
A tool that can perform filename pattern conversion is Mp3tag.
Choose convert and then filename - filename.
I'm sure there's other tools out there too!
(This answer isn't really in the StackOverflow spirit but I think the OP isn't necessarily looking for an automated solution...)
Based on alex function, but this one correctly adds the .xls extension.
foreach (glob('/path/to/your/*.xls') as $file)
{
rename($file, dirname($file) . '/' . substr(basename($file), 4, 4) . substr(basename($file), 2, 2) . substr(basename($file), 0, 2) . '.xls');
}
if you have bash
#!/bin/bash
shopt -s nullglob
for xls in [0-9]*.xls
do
day=${xls:0:2}
mth=${xls:3:2}
yr=${xls:4:4}
echo mv "$xls" "${yr}${mth}${day}.xls"
done
no need external tools.
File names are: ddmmyyyy.xls (e.g.
15012010 for 15.th Jan 2010)
and I would like: yyyymmdd.xls (e.g.
20100115.xls)
Use this script.
# Script RenameYYYYMMDD.txt
var str dir, list, file, newname, dd, mm
lf -r -n "*.xls" $dir > $list
while ($list <> "")
do
lex "1" $list > $file ; stex -p "^/^l[" $file > $newname ; chex "2]" $newname > $dd
chex "2]" $newname > $mm ; sin "^.^l" ($mm+$dd) $newname > null
system rename ("\""+$file+"\"") $newname
done
This script is in biterscripting ( http://www.biterscripting.com ). Test the script first in a test folder.
To test, save the script code in file "C:/Scripts/RenameYYYYMMDD.txt", and enter the following command.
script "C:/Scripts/RenameYYYYMMDD.txt" dir("C:/path/to/test folder")
This command will rename all files ddmmyyyy.xls under directory "C:/path/to/test folder" to yyyymmdd.xls.

Categories