scandir - sort numeric filenames - php

Done some searching, but can't seem to find the exact answer I'm looking for.
I'd like to pull in files with numbered filenames using 'scandir($dir)', but have them sort properly. For example, file names are:
1-something.ext
2-something-else.ext
3-a-third-name.ext
.
.
.
10-another-thing.ext
11-more-names.ext
The problem I'm having is that 10-a-fourth-thing.ext will show before 2-something-else.ext. I'd like to find a better way of solving this issue than introducing leading '0' in front of all file names.
Any thoughts? Thanks.

natsort does exactly what you need.
sort with SORT_NUMERIC will also work for filenames that start with numbers, but it will break if there are also names that have no numbers in front (all non-number-prefixed names will be sorted before number-prefixed names, and their order relative to one another will be random instead of alphabetic).

You can use sort like this:
sort($arr, SORT_NUMERIC); // asuming $arr is your array

If you want to reassign keys (which natsort does not do), use usort() combined with strnatcmp() or strnatcasecmp():
usort($arr, 'strnatcmp'); // Or 'strnatcasecmp' for case insensitive

Related

Naturally sorting results of a scandir?

I'm trying to sort an array containing the results of a scandir function. I've tried using the natsort() php function, but it doesn't appear to be working as I need it to for my directories.
The contents of the scanned directory assume a HH(Z)DDMonYYYY naming convention. After using the natsort function on the array, the result is this:
$dates = ["21Z23Oct2017", "20Z23Oct2017", "19Z23Oct2017",
"19Z18Oct2017", "19Z17Oct2017", "19Z16Oct2017",
"18Z23Oct2017", "18Z18Oct2017", "17Z23Oct2017", ...]
As you can see, the function is evaluating the first two digits and using them to sort, but it ignores the days (23, 18, 17, 16) in each name.
I would like for the resulting array to look like this:
$dates = ["21Z23Oct2017", "20Z23Oct2017", "19Z23Oct2017",
"18Z23Oct2017", "17Z23Oct2017", ...,
"19Z18Oct2017", "18Z18Oct2017", "19Z17Oct2017", "19Z16Oct2017"]
Since the directories are created sequentially, I realize that I could sort by directory creation or modification time and be just fine 99% of the time. However, on rare occasions the directories modification times will not be in perfect order, and I'd like to avoid problems when that is the case.
Is there a way to accomplish my goal in php without having to use modification or creation times?
Thanks to all in advance!
EDIT: for context, I'm using a Python script to write to and perform a simple operation on each of these directories. Python includes a package called "natsorted" for those unaware, which sorts the directories in the example array above without any trouble. Just wondering if there was a simple php solution as well before I start adding complexity.
All that natsort() does is try to address the sorting of strings with arbitrary-length digit sequences, it does not magically interpret bizarre date formats. Even the PHP functions that do try to figure out dates would not be able to figure this one out as they actually just use a predefined list of common formats and even then are problematic.
IMHO you should always use something like DateTime::createFromFormat() and an explicit format string.
<?php
$dates = ["21Z23Oct2017", "20Z23Oct2017", "19Z23Oct2017",
"19Z18Oct2017", "19Z17Oct2017", "19Z16Oct2017",
"18Z23Oct2017", "18Z18Oct2017", "17Z23Oct2017"];
usort(
$dates,
function($a,$b){
return DateTime::createFromFormat("H\ZdMY",$a) <=> DateTime::createFromFormat("H\ZdMY",$b);
}
);
echo json_encode($dates, JSON_PRETTY_PRINT);
Output:
[
"19Z16Oct2017",
"19Z17Oct2017",
"18Z18Oct2017",
"19Z18Oct2017",
"17Z23Oct2017",
"18Z23Oct2017",
"19Z23Oct2017",
"20Z23Oct2017",
"21Z23Oct2017"
]
This will work adequately for small sets of dates and/or when called infrequently. However if you're sorting a large number of dates, or sorting them frequently, you're going to want to pre-create the DateTime objects beforehand. As it stands both DateTimes in the comparison are recreated for each comparison.
Going forward, you should always format dates in a readable and sortable fashion, eg: YYYY-MM-DD hh:mm:ss, ideally ISO8601.

How to sort an array of numeric strings which also contain numbers. (natural ordering) in PHP

How do you sort an array of strings which also contain numbers.
For example I have used the glob() function to get a list of file names.
By default the array output the files in ascending order but reads each numeric character individually rather than a whole number.
Default output
"C://path/to/file/file.tpl"
"C://path/to/file/file1.tpl"
"C://path/to/file/file11.tpl"
"C://path/to/file/file12.tpl"
....
....
"C://path/to/file/file2.tpl"
Required output
"C://path/to/file/file.tpl"
"C://path/to/file/file1.tpl"
"C://path/to/file/file2.tpl"
...
...
"C://path/to/file/file11.tpl"
"C://path/to/file/file12.tpl"
Is there a PHP function that performs this?
Many thanks
Use natsort
This function implements a sort algorithm that orders alphanumeric strings in the way a human being would while maintaining key/value associations. This is described as a "natural ordering".
sort($array, SORT_NATURAL);
or
natsort($array);
Natural sorting.

alphabetically sort array based on first few strings

I woud like to sort an array which typically includes names and email addresses. The problem is that the email addresses appear last even though they may start with 'a'
e.g.
$myarray = ("Alex Mayfeild", "David Beckham", "Oliver Twist", "ant.stev#wherever.com", "peter.pan#neverland.com", ........) //and so on
Upon sorting the array using php's sort function "ant.stev#wherever.com" will appear close to the end even though the functionality I would like to achieve is for him to appear after Alex.
natcasesort and natsource functions based on natural ordering seem to fail. Correction: natcasesource works it returns true when working as stated in docs. Thanks #meagar
Is there anyway to achieve the requested functionality. Thanks for any help guys. It is very much appreciated.
sort() is case sensitive, as it sorts based on the letters ASCII value.
Try natcasesort(), if you want too "sort an array using a case insensitive 'natural order' algorithm".
Seems to me that sort($myarray, SORT_STRING|SORT_FLAG_CASE); should sort the array the way you want.

Best way to replace strings

I have an array where I'm storing the bad and good string pairs.
Ex.:
array(
"Man. United"=>"Manchester United",
"Bay. Munchen"=>"Bayern Munchen",
"Bay. Munich"=>"Bayern Munchen",
...
)
so in this case I'm using strtr to replace the given string, but in this case I always have to add or remove data's from the array. Is there any way to store just the good names in one array and replace which is very similar? For me is much easier to build up the array with the good names.
You could use similar_text or one of the other functions mentioned in the see also section to try and correct them automatically, but won't be as accurate as if you list the spelling mistakes yourself.
*edit: levenshtein may also be a good one to try...
The Levenshtein distance is defined as the minimal number of
characters you have to replace, insert or delete to transform str1
into str2.

php natural comparsion sort that ignores (but does not get rid of) non-numeric data

Basically, I have a directory with a bunch of files names that I have loaded into an array. The file names tell me something about the text they represent (i.e. Prologue, chapterone, chaptertwo), but in the file name I also include a sequential number to keep them ordered. So 'prollecture1.xml', 'prollecture2.xml', 'prollecture3.xml', . . . 'prollecture12.xml', 'chapteronelecture13.xml', 'chapteronelecture14.xml'. . . 'conclusionlecture18.xml', etc.
I want to sort this so that the array lists them in numerical order. Using a "natural comparison sort" gets me close, but the sort begins with the first character of the file name, and thus 'chapteronelecture13.xml' is listed before 'prollecture1.xml' because 'c' comes before 'p'. If had known I wanted to do this from the beginning I would have put the numbers first. But to change all the file names now would be a lot of work.
My question: is there a way to the get the "natural string comparison" to ignore the first part of the file name and begin at "lecture##"? Or even better, can the sort ignore (but not remove) all non-numeric data and sort the array solely by the numbers embedded in the file name?
Thanks for your help.
I think there's no built-in function that do this, but using usort you may accomplish that:
function just_numerical_sort($a, $b)
{
return preg_replace('/[^0-9]/', '', $a) - preg_replace('/[^0-9]/', '', $b);
}
usort($array, 'just_numerical_sort');
the preg_replace returns a copy of $a or $b with all non-numerical characters removed (I haven't tested it, but I think it works).
You should write a script that renames all of the files for you. Don't write some hack to overcome the wrong naming of the files. This will most liekly cause more headaches in the future.
Shoudln't be too hard to write a script that renames the files with a leading 0 for numbers less than 10, or even two leading zeros for numbers less than 10, and one leading zero for numbers between 10 and 99.
filename001.xml
filename002.xml
Then your sort will work perfect.

Categories