Naturally sorting results of a scandir? - php

I'm trying to sort an array containing the results of a scandir function. I've tried using the natsort() php function, but it doesn't appear to be working as I need it to for my directories.
The contents of the scanned directory assume a HH(Z)DDMonYYYY naming convention. After using the natsort function on the array, the result is this:
$dates = ["21Z23Oct2017", "20Z23Oct2017", "19Z23Oct2017",
"19Z18Oct2017", "19Z17Oct2017", "19Z16Oct2017",
"18Z23Oct2017", "18Z18Oct2017", "17Z23Oct2017", ...]
As you can see, the function is evaluating the first two digits and using them to sort, but it ignores the days (23, 18, 17, 16) in each name.
I would like for the resulting array to look like this:
$dates = ["21Z23Oct2017", "20Z23Oct2017", "19Z23Oct2017",
"18Z23Oct2017", "17Z23Oct2017", ...,
"19Z18Oct2017", "18Z18Oct2017", "19Z17Oct2017", "19Z16Oct2017"]
Since the directories are created sequentially, I realize that I could sort by directory creation or modification time and be just fine 99% of the time. However, on rare occasions the directories modification times will not be in perfect order, and I'd like to avoid problems when that is the case.
Is there a way to accomplish my goal in php without having to use modification or creation times?
Thanks to all in advance!
EDIT: for context, I'm using a Python script to write to and perform a simple operation on each of these directories. Python includes a package called "natsorted" for those unaware, which sorts the directories in the example array above without any trouble. Just wondering if there was a simple php solution as well before I start adding complexity.

All that natsort() does is try to address the sorting of strings with arbitrary-length digit sequences, it does not magically interpret bizarre date formats. Even the PHP functions that do try to figure out dates would not be able to figure this one out as they actually just use a predefined list of common formats and even then are problematic.
IMHO you should always use something like DateTime::createFromFormat() and an explicit format string.
<?php
$dates = ["21Z23Oct2017", "20Z23Oct2017", "19Z23Oct2017",
"19Z18Oct2017", "19Z17Oct2017", "19Z16Oct2017",
"18Z23Oct2017", "18Z18Oct2017", "17Z23Oct2017"];
usort(
$dates,
function($a,$b){
return DateTime::createFromFormat("H\ZdMY",$a) <=> DateTime::createFromFormat("H\ZdMY",$b);
}
);
echo json_encode($dates, JSON_PRETTY_PRINT);
Output:
[
"19Z16Oct2017",
"19Z17Oct2017",
"18Z18Oct2017",
"19Z18Oct2017",
"17Z23Oct2017",
"18Z23Oct2017",
"19Z23Oct2017",
"20Z23Oct2017",
"21Z23Oct2017"
]
This will work adequately for small sets of dates and/or when called infrequently. However if you're sorting a large number of dates, or sorting them frequently, you're going to want to pre-create the DateTime objects beforehand. As it stands both DateTimes in the comparison are recreated for each comparison.
Going forward, you should always format dates in a readable and sortable fashion, eg: YYYY-MM-DD hh:mm:ss, ideally ISO8601.

Related

PHP Sort Complex Array

Instead of working with mysql data, I have created csv file that I plan to use as the source of data for content, etc.
And I have successfully been able to parse the csv and store it into a complex array, that has the following header row aka the keys for the arrays.
"Title","Year","Rated","Released","Runtime","Genre","Director","Writer","Actors","Plot","Language","Country","Awards","Poster","Metascore","imdbRating","imdbVotes","imdbID","Type","Response"
My current stage is to allow dynamic ajax sorting of the arrays.
I have two fields, that I am allowing sorting at the beginning, "Year" and "Title".
So I pass different url paramters, such as "yearasc" or "yeardesc" or "titleasc" or "titledesc".
Then try to sort for that.
So what I did reading a different so post, was to do this.
First I create new arrays that only store the key fields, I need for sorting.
Then based on what sort type, do a different array_multisort.
array_multisort($year, SORT_ASC, $all_rows);
But what I get is results that multiple dupplicate data.
But I wonder if having the first row, be the header row, which is required by the function I pass the data after any sorting to, is causing issues with array sorting.
For simple array sorting, existing functions makes sense and work fine.
But for complicated ones, it is just complex to even understand how to approach solving this problem.
Any suggestions, thoughts or ideas are appreciated and thanked.
Thank you!
I don't have the actual code that is probably going to help you, but I do have a suggestion as for how you can tackle this and make it work..
Keep it simple. First, create your own CSV with just 1 header (Year or Title ) that you want to sort on.
Write your code to sort on that.
Then, add the other one ( Title if you used Year before, or Year if you used Title before ) and sort on whichever you want.
Then, add one more header (say, Rated) that you don't want to sort on.
You should then be able to work with the original CSV.
I'd try to write simple methods and keep your processing to a minimum in each one.
I hope that helps. I realize its more philosophical of an answer, so it is hit or miss if it helps you get the job done. Just realize that this approach will, indeed, take a little more time to write - but the point behind it is that you're taking out all of the "noise" that's getting in your way first. It helps you look at only your problem first and solve that.
You can set a custom sort function to the array. Use asort() if you need to keep original array keys.
<?php
$sortfields = array('year', 'bleh');
function cmp($a, $b) {
global $sortfields;
foreach ($sortfields as $sortfield) {
$cmp = strcmp($a[$sortfield], $b[$sortfield]);
// if desc, invert sign of $cmp
if ($cmp !== 0)
return $cmp;
}
return 0;
}
usort($all_rows, "cmp");
The function usort() calls a user defined comparison function, which returns the same logic from strcmp function: 0 if equal, < 0 is $a is less than $b and > 0 if $a is greater than $b.
This function will compare each field set in $sortfields variable, if it find any comparison that is different (in the order set), it will immediately return the difference.

Why would one want to pass primitive-type parameters by reference in PHP?

One thing that's always bugged me (and everyone else, ever) about PHP is its inconsistency in function naming and parameters. Another more recent annoyance is its tendency to ask for function parameters by reference rather than by value.
I did a quick browse through the PHP manual, and found the function sort() as an example. If I was implementing that function I'd take an array by value, sort it into a new array, and return the new value. In PHP, sort() returns a boolean, and modifies the existing array.
How I'd like to call sort():
$array = array('c','a','b');
$sorted_array = sort($array);
How PHP wants me to call sort():
$array = array('c','a','b');
sort($array);
$sorted_array = $array;
And additionally, the following throws a fatal error: Fatal error: Only variables can be passed by reference
sort(array('c','a','b');
I'd imagine that part of this could be a legacy of PHP's old days, but there must have been a reason things were done this way. I can see the value in passing an object by reference ID like PHP 5+ does (which I guess is sort of in between pass by reference and pass by value), but not in the case of strings, arrays, integers and such.
I'm not an expert in the field of Computer Science, so as you can probably gather I'm trying to grasp some of these concepts still, and I'm curious as to whether there's a reason things are set up this way, or whether it's just a leftover.
The main reason is that PHP was developed by C programmers, and this is very much a C-programming paradigm. In C, it makes sense to pass a pointer to a data structure you want changed. In PHP, not so much (Among other things, because references are not the same as a pointer).
I believe this is done for speed-reason.
Most of the time you need the array you are working on to be sorted, not a copy.
If sort should have returned a new copy of the array then for each time you call sort(); the PHP engine should have copied the array into new one (lowering speed and increasing space cost) and you would have no way to control this behaviour.
If you need the original array to be not sorted (and this doesn't happen so often) then just do:
$copy = $yourArray;
sort($yourArray);

scandir - sort numeric filenames

Done some searching, but can't seem to find the exact answer I'm looking for.
I'd like to pull in files with numbered filenames using 'scandir($dir)', but have them sort properly. For example, file names are:
1-something.ext
2-something-else.ext
3-a-third-name.ext
.
.
.
10-another-thing.ext
11-more-names.ext
The problem I'm having is that 10-a-fourth-thing.ext will show before 2-something-else.ext. I'd like to find a better way of solving this issue than introducing leading '0' in front of all file names.
Any thoughts? Thanks.
natsort does exactly what you need.
sort with SORT_NUMERIC will also work for filenames that start with numbers, but it will break if there are also names that have no numbers in front (all non-number-prefixed names will be sorted before number-prefixed names, and their order relative to one another will be random instead of alphabetic).
You can use sort like this:
sort($arr, SORT_NUMERIC); // asuming $arr is your array
If you want to reassign keys (which natsort does not do), use usort() combined with strnatcmp() or strnatcasecmp():
usort($arr, 'strnatcmp'); // Or 'strnatcasecmp' for case insensitive

php natural comparsion sort that ignores (but does not get rid of) non-numeric data

Basically, I have a directory with a bunch of files names that I have loaded into an array. The file names tell me something about the text they represent (i.e. Prologue, chapterone, chaptertwo), but in the file name I also include a sequential number to keep them ordered. So 'prollecture1.xml', 'prollecture2.xml', 'prollecture3.xml', . . . 'prollecture12.xml', 'chapteronelecture13.xml', 'chapteronelecture14.xml'. . . 'conclusionlecture18.xml', etc.
I want to sort this so that the array lists them in numerical order. Using a "natural comparison sort" gets me close, but the sort begins with the first character of the file name, and thus 'chapteronelecture13.xml' is listed before 'prollecture1.xml' because 'c' comes before 'p'. If had known I wanted to do this from the beginning I would have put the numbers first. But to change all the file names now would be a lot of work.
My question: is there a way to the get the "natural string comparison" to ignore the first part of the file name and begin at "lecture##"? Or even better, can the sort ignore (but not remove) all non-numeric data and sort the array solely by the numbers embedded in the file name?
Thanks for your help.
I think there's no built-in function that do this, but using usort you may accomplish that:
function just_numerical_sort($a, $b)
{
return preg_replace('/[^0-9]/', '', $a) - preg_replace('/[^0-9]/', '', $b);
}
usort($array, 'just_numerical_sort');
the preg_replace returns a copy of $a or $b with all non-numerical characters removed (I haven't tested it, but I think it works).
You should write a script that renames all of the files for you. Don't write some hack to overcome the wrong naming of the files. This will most liekly cause more headaches in the future.
Shoudln't be too hard to write a script that renames the files with a leading 0 for numbers less than 10, or even two leading zeros for numbers less than 10, and one leading zero for numbers between 10 and 99.
filename001.xml
filename002.xml
Then your sort will work perfect.

Best method of passing/return values

The reason I am asking this question is because I have landed my first real (yes, a paid office job - no more volunteering!) Web Development job about two months ago. I have a couple of associates in computer information systems (web development and programming). But as many of you know, what you learn in college and what you need in the job site can be very different and much more. I am definitely learning from my job - I recreated the entire framework we use from scratch in a MVC architecture - first time doing anything related to design patterns.
I was wondering what you would recommend as the best way to pass/return values around in OO PHP? Right now I have not implement any sort of standard, but I would like to create one before the size of the framework increases any more. I return arrays when more than 1 value needs to get return, and sometimes pass arrays or have multiple parameters. Is arrays the best way or is there a more efficient method, such as json? I like the idea of arrays in that to pass more values or less, you just need to change the array and not the function definition itself.
Thank you all, just trying to become a better developer.
EDIT: I'm sorry all, I thought I had accepted an answer for this question. My bad, very, very bad.
How often do you run across a situation where you actually need multiple return values? I can't imagine it's that often.
And I don't mean a scenario where you are returning something that's expected to be an enumerable data collection of some sort (i.e., a query result), but where the returned array has no other meaning that to just hold two-or-more values.
One technique the PHP library itself uses is reference parameter, such as with preg_match(). The function itself returns a single value, a boolean, but optionally uses the supplied 3rd parameter to store the matched data. This is, in essence, a "second return value".
Definitely don't use a data interchange format like JSON. the purpose of these formats is to move data between disparate systems in an expected, parse-able way. In a single PHP execution you don't need that.
You can return anything you want: a single value, an array or a reference (depending on the function needs). Just be consistent.
But please don't use JSON internally. It just produces unnecessary overhead.
I also use arrays for returning multiple values, but in practice it doesn't happen very often. If it does, it's generally a sensible grouping of data, such as returning array('x'=>10,'y'=>10) from a function called getCoordinates(). If you find yourself doing lots of processing and returning wads of data in arrays from a lot of functions, there's probably some refactoring that can be done to put the work into smaller units.
That being said, you mentioned:
I like the idea of arrays in that to pass more values or less, you just need to change the array and not the function definition itself.
In that regard, another technique you might be interested in is using functions with variable numbers of arguments. It is perfectly acceptable to declare a function with no parameters:
function stuff() {
//do some stuff
}
but call it with all the parameters you care to give it:
$x = stuff($var1, $var2, $var3, $var4);
By using func_get_args(), func_get_arg() (singular) and func_num_args() you can easily find/loop all the parameters that were passed. This works very well if you don't have specific parameters in mind, say for instance a sum() function:
function sum()
{
$out = 0;
for($i = 0; $i < $c = func_num_args(); $i++) {
$out += func_get_arg($i);
}
return $out;
}
//echoes 35
echo sum(10,10,15);
Food for thought, maybe you'll find it useful.
The only thing I'm careful to avoid passing/returning arrays where the keys have "special" meaning. Example:
<?php
// Bad. Don't pass around arrays with 'special' keys
$personArray = array("eyeColor"=>"blue", "height"=>198, "weight"=>103, ...);
?>
Code that uses an array like this is harder to refactor and debug. This type of structure is better represented as an object.
<?php
Interface Person {
/**
* #return string Color Name
*/
public function getEyeColor();
...
}
?>
This interface provides a contract that the consuming code can rely on.
Other than that I can't think of any reason to limit yourself.
Note: to be clear, associative arrays are great for list data. like:
<?php
// Good array
$usStates = array("AL"=>"ALABAMA", "AK"="ALASKA", ... );
?>

Categories