UCA + Natural Sorting - php

I recently learnt that PHP already supports the Unicode Collation Algorithm via the intl extension:
$array = array
(
'al', 'be',
'Alpha', 'Beta',
'Álpha', 'Àlpha', 'Älpha',
'かたかな',
'img10.png', 'img12.png',
'img1.png', 'img2.png',
);
if (extension_loaded('intl') === true)
{
collator_asort(collator_create('root'), $array);
}
Array
(
[0] => al
[2] => Alpha
[4] => Álpha
[5] => Àlpha
[6] => Älpha
[1] => be
[3] => Beta
[11] => img1.png
[9] => img10.png
[8] => img12.png
[10] => img2.png
[7] => かたかな
)
As you can see this seems to work perfectly, even with mixed case strings! The only drawback I've encountered so far is that there is no support for natural sorting and I'm wondering what would be the best way to work around that, so that I can merge the best of the two worlds.
I've tried to specify the Collator::SORT_NUMERIC sort flag but the result is way messier:
collator_asort(collator_create('root'), $array, Collator::SORT_NUMERIC);
Array
(
[8] => img12.png
[7] => かたかな
[9] => img10.png
[10] => img2.png
[11] => img1.png
[6] => Älpha
[5] => Àlpha
[1] => be
[2] => Alpha
[3] => Beta
[4] => Álpha
[0] => al
)
However, if I run the same test with only the img*.png values I get the ideal output:
Array
(
[3] => img1.png
[2] => img2.png
[1] => img10.png
[0] => img12.png
)
Can anyone think of a way to preserve the Unicode sorting while adding natural sorting capabilities?

After digging a little more in the documentation I've found the solution:
if (extension_loaded('intl') === true)
{
if (is_object($collator = collator_create('root')) === true)
{
$collator->setAttribute(Collator::NUMERIC_COLLATION, Collator::ON);
$collator->asort($array);
}
}
Output:
Array
(
[0] => al
[3] => Alpha
[5] => Álpha
[6] => Àlpha
[7] => Älpha
[1] => be
[4] => Beta
[10] => img1.png
[11] => img2.png
[8] => img10.png
[9] => img12.png
[2] => かたかな
)

This is trivially done. You simply preprocess the list to zero-pad numbers. For example, using my ucsort script, which supports the UCA, on this list of filenames:
% cat /tmp/numfiles
img4.png
img1.png
img2.png
img12.png
img21.png
img10.png
img20.png
img3.png
img22.png
will produce the desired output by using the Unicode::Collate module’s --preprocess hook to transform runs of digits into zero-padded ones:
% ucsort --preprocess='s/(\d+)/sprintf "%020d", $1/ge' /tmp/numfiles
img1.png
img2.png
img3.png
img4.png
img10.png
img12.png
img20.png
img21.png
img22.png
Looking at the PHP documentation you cite, it does not appear that that PHP library supports the full UCA tailoring possibilities that the Perl Unicode::Collate module supports. In fact, it looks more like Perl’s Unicode::Collate::Locale module, except that the PHP library code does not seem to support the inherited collation options that the Perl code does.
I suppose that if all else fails, you could call Perl code to do what needs done.

Based on the answer of #tchrist I've came up with this:
function sortIntl($array, $natural = true)
{
$data = $array;
if ($natural === true)
{
$data = preg_replace_callback('~([0-9]+)~', 'natsortIntl', $data);
}
collator_asort(collator_create('root'), $data);
return array_intersect_key($array, $data);
}
function natsortIntl($number)
{
return sprintf('%020d', $number);
}
Output:
Array
(
[0] => 1
[1] => 100
[2] => al
[3] => be
[4] => Alpha
[5] => Beta
[6] => Álpha
[7] => Àlpha
[8] => Älpha
[9] => かたかな
[10] => img1.png
[11] => img2.png
[12] => img10.png
[13] => img20.png
)
Still hoping for a better solution though.

Related

How to split 1 array into 2 arrays, remove certain items, and combine them again into 1 array in PHP?

i want to create something using array. I have 1 array and i need to split it into 2 array. After that search specific items from both array and remove it then combine it 2 array into 1 array.
How do i do that?
I already try to use unset for array but confuse how to use it for specific key since my array data format like 16/2/1/1 and 16/2/1/5. I need to remove data which have 1.
My format array is like this
Array
(
[1] => Array
(
[0] => 16/2/1/1 --> remove this have 1 after 2
[1] => 16/2/0/2
[2] => 16/2/0/3
[3] => 16/2/0/4
[4] => 16/2/0/5
[5] => 16/2/0/6
[6] => 16/2/0/7
[7] => 16/2/0/8
[8] => 16/2/0/9
[9] => 16/2/0/10
[10] => 16/2/0/11
[11] => 16/2/0/12
[12] => 16/2/0/13
[13] => 16/2/0/14
[14] => 16/2/0/15
[15] => 16/2/0/16
)
[2] => Array
(
[0] => 16/2/0/1
[1] => 16/2/0/2
[2] => 16/2/0/3
[3] => 16/2/0/4
[4] => 16/2/1/5 --> and this have 1 after 2 before 5
[5] => 16/2/0/6
[6] => 16/2/0/7
[7] => 16/2/0/8
[8] => 16/2/0/9
[9] => 16/2/0/10
[10] => 16/2/0/11
[11] => 16/2/0/12
[12] => 16/2/0/13
[13] => 16/2/0/14
[14] => 16/2/0/15
[15] => 16/2/0/16
)
)
i expect the output something like (after combine)
Array
(
[0] => 16/2/0/2
[1] => 16/2/0/3
[2] => 16/2/0/4
[3] => 16/2/0/6
[4] => 16/2/0/7
[5] => 16/2/0/8
[6] => 16/2/0/9
[7] => 16/2/0/10
[8] => 16/2/0/11
[9] => 16/2/0/12
[10] => 16/2/0/13
[11] => 16/2/0/14
[12] => 16/2/0/15
[13] => 16/2/0/16
)
Thanks for time to help me.
Make the array unique and then extract items that are digits/digits/NOT 1/digits:
$array = preg_grep('#^\d+/\d+/[^1]/\d+#', array_unique($array));
I would use preg_grep which allows you to search an array using a Regular expression.
$array =[
'16/2/0/13',
'16/2/0/16',
'16/2/1/5'
];
$array = preg_grep('~^16/2/0/\d+$~', $array);
print_r($array);
Output
Array
(
[0] => 16/2/0/13
[1] => 16/2/0/16
)
Sandbox
The Regex
^ match start of string
16/2/0/ - match literally (at the start of string, see above)
\d+ any digit one or more
$ match end of string
So Regular expressions is a way to do pattern matching, in this case the pattern is 16/2/0/{n} where {n} is any number. So by doing this we can find only those items that match that pattern.
Then if you have duplicates, you can do array_unique() and easily remove those.
There are many ways to do this array_filter with a custom callback etc. But this is the most straightforward way (if you know Regex).

find non duplicated item in an array

I have two arrays built from different directories that contain file names stripped of extensions. I want to find the ones that don't make a pair thus I merged the array to obtain the array below. How can I find the only non duplicate item in an array?
Array
(
[0] => dbbackup_2014.09.03_07_06_27
[1] => dbbackup_2014.09.03_07_07_08
[2] => dbbackup_2014.09.03_07_13_33
[3] => dbbackup_2014.09.03_07_15_24
[4] => dbbackup_2014.09.03_07_21_57
[5] => dbbackup_2014.09.03_07_22_11
[6] => dbbackup_2014.09.03_08_40_35
[7] => dbbackup_2014.09.03_08_41_36
[8] => dbbackup_2014.09.03_08_43_38
[9] => dbbackup_2014.09.04_04_59_08
[10] => dbbackup_2014.09.03_07_06_27
[11] => dbbackup_2014.09.03_07_07_08
[12] => dbbackup_2014.09.03_07_13_33
[13] => dbbackup_2014.09.03_07_15_24
[14] => dbbackup_2014.09.03_07_21_57
[15] => dbbackup_2014.09.03_07_22_11
[16] => dbbackup_2014.09.03_08_40_35
[17] => dbbackup_2014.09.03_08_41_36
[18] => dbbackup_2014.09.03_08_43_38
)
Note: it is [9]
$a = array_flip(array_filter(array_count_values($a),function($item){
return $item == 1 ? true : false;
}));
print_r($a);
Output
Array
(
[1] => dbbackup_2014.09.04_04_59_08
)
Ideone
foreach($array as $data)
{
$values=explode("_",$data);
$output[$values[1]]++;
}
foreach($output as $date=>$number)
{
if($number==1)
echo $date;
}
Output:
2014.09.04
Fiddle

Re-order PHP array by middle key as start (Circular Sorting)

very basic question however I have had some trouble finding the answers on PHP.NET.
I have the following array:
Array (
[1] => Array
(
[1] => 4
[2] => 1
[3] => 5
[4] => 3
)
[2] => Array
(
[5] => 2
[6] => 8
[7] => 7
[8] => 6
)
[3] => Array
(
[9] => 10
[10] => 9
[11] => 12
[12] => 11
)
[4] => Array
(
[13] => 15
[14] => 16
[15] => 14
[16] => 13
)
)
I want the array to be re-ordered so that the key number 3 in the first series of the array becomes the first, then the rest to be re-ordered from there to eventually get the result of:
Array (
[3] => Array
(
[9] => 10
[10] => 9
[11] => 12
[12] => 11
)
[4] => Array
(
[13] => 15
[14] => 16
[15] => 14
[16] => 13
)
[1] => Array
(
[1] => 4
[2] => 1
[3] => 5
[4] => 3
)
[2] => Array
(
[5] => 2
[6] => 8
[7] => 7
[8] => 6
)
)
I am looking for a way to do this so I can define the array, then the first level key I need to sort by, and then it will return the array in this way.
The standard PHP keys didn't seem to offer something like this, so it would be good to be able to have a separate function such as $newArray = reorder_array($array, $key);
I don't require any sorting of the second level, only the initial 4 main / first level array sections.
You help is greatly appreciated as I have been sitting on this one for awhile without a clear and simple solution.
You re-ordering can be simply implemented with one foreach loop, like:
function reorderArray($array, $key)
{
$found = false;
foreach($array as $k=>$v)
{
$found = $found || $k===$key;
if(!$found)
{
unset($array[$k]);
$array[$k] = $v;
}
//else break can be added for performance issues
}
return $array;
}
with usage
$array=[1=>'foo', 4=>'bar', 9=>'baz', 'test'=>51];
var_dump(reorderArray($array, 9));
var_dump(reorderArray($array, 'test'));
var_dump(reorderArray($array, 'no_such_key'));//original array in result
-check this demo. If keys are consecutive numerics, however, this can be easily implemented with array_slice() calls.

escaping user input while keeping smarty variables

I have an array of sample user string inputs which may or may not have smarty variables in them which id like to escape with {literal}{/literal} tags.
Array
(
[0] => {$PLEASE}
[1] => {PLEASE}
[2] => {{PLEASE}}
[3] => {{{PLEASE}}}
[4] => {a{PLEASE}}
[5] => {a{$PLEASE}}
[6] => {{$PLEASE}a}
[7] => {{PLEASE}a}
[8] => {{{$PLEASE}}}
[9] => {{{{PLEASE}}}}
)
Here is what I hope to achieve.
Array
(
[0] => {$PLEASE}
[1] => {literal}{PLEASE}{/literal}
[2] => {literal}{{PLEASE}}{/literal}
[3] => {literal}{{{PLEASE}}}{/literal}
[4] => {literal}{a{PLEASE}{/literal}
[5] => {literal}{a{/literal}{$PLEASE}{literal}}{/literal}
[6] => {literal}{{/literal}{$PLEASE}{literal}a}{/literal}
[7] => {literal}{PLEASE}a}{/literal}
[8] => {literal}{{{/literal}{$PLEASE}{literal}}}{/literal}
[9] => {literal}{{{{PLEASE}}}}{/literal}
)
Right now I have this
$data = preg_replace('/{+([^\$])([a-z0-9]*)}+/si', '{literal}{\1\2}{/literal}', $data);
Which gives me
Array
(
[0] => {$PLEASE}
[1] => {literal}{PLEASE}{/literal}
[2] => {literal}{PLEASE}{/literal}
[3] => {literal}{PLEASE}{/literal}
[4] => {a{literal}{PLEASE}{/literal}
[5] => {a{$PLEASE}}
[6] => {{$PLEASE}a}
[7] => {literal}{PLEASE}{/literal}a}
[8] => {{{$PLEASE}}}
[9] => {literal}{PLEASE}{/literal}
)
Been stuck for quite sometime now, was wondering if anyone could help me figure it out or if its even possible to do so.
ok, I'm sure there's a more elegant way, perhaps one-liner, but whatever, it works with the following:
//Step 1: Replace 'real' smarty variables with an intermediate string
$data1 = preg_replace('/{(\$\w+)}/', "!!!$1!!!", $arr);
//replace start and end curly braces with {literal}:
$data2 = preg_replace('/{(.*)}/', '{literal}{$1}{/literal}', $data);
//Replace all inner smarty variables with their original string:
$data3 = preg_replace('/.!!!(.*)!!!/', '{/literal}$1{literal}', $data2);
//Replace standalone variables with their original string:
$data4 = preg_replace('/^!!!(.*)!!!$/', '{$1}', $data3);
You can merge steps 3&4 in one command

preg_split with regex giving incorrect output

I'm using preg_split to an string, but I'm not getting desired output. For example
$string = 'Tachycardia limit_from:1900-01-01 limit_to:2027-08-29 numresults:10 sort:publication-date direction:descending facet-on-toc-section-id:Case Reports';
$vals = preg_split("/(\w*\d?):/", $string, NULL, PREG_SPLIT_DELIM_CAPTURE);
is generating output
Array
(
[0] => Tachycardia
[1] => limit_from
[2] => 1900-01-01
[3] => limit_to
[4] => 2027-08-29
[5] => numresults
[6] => 10
[7] => sort
[8] => publication-date
[9] => direction
[10] => descending facet-on-toc-section-
[11] => id
[12] => Case Reports
)
Which is wrong, desire output it
Array
(
[0] => Tachycardia
[1] => limit_from
[2] => 1900-01-01
[3] => limit_to
[4] => 2027-08-29
[5] => numresults
[6] => 10
[7] => sort
[8] => publication-date
[9] => direction
[10] => descending
[11] => facet-on-toc-section-id
[12] => Case Reports
)
There something wrong with regex, but I'm not able to fix it.
I would use
$vals = preg_split("/(\S+):/", $string, NULL, PREG_SPLIT_DELIM_CAPTURE);
Output is exactly like you want
It's because the \w class does not include the character -, so i would expand the \w with that too:
/((?:\w|-)*\d?):/
Try this regex instead to include '-' or other characters in your splitting pattern: http://regexr.com?32qgs
((?:[\w\-])*\d?):

Categories