How to match rows in array to an array of masks? - php

I have array like this:
array('1224*', '543*', '321*' ...) which contains about 17,00 "masks" or prefixes.
I have a second array:
array('123456789', '123456788', '987654321' ....) which contain about 250,000 numbers.
Now, how can I efficiently match every number from the second array using the array of masks/prefixes?
[EDIT]
The first array contains only prefixes and every entry has only one * at the end.

Well, here's a solution:
Prelimary steps:
Sort array 1, cutting off the *'s.
Searching:
For each number in array 2 do
Find the first and last entry in array 1 of which the first character matches that of number (binary search).
Do the same for the second character, this time searching not the whole array but between first and last (binary search).
Repeat 2 for the nth character until a string is found.
This should be O(k*n*log(n)) where n is the average number length (in digits) and k the number of numbers.
Basically this is a 1 dimensional Radix tree, for optimal performance you should implement it, but it can be quite hard.

My two cents....
$s = array('1234*', '543*', '321*');
$f = array('123456789', '123456788', '987654321');
foreach ($f as $haystack) {
echo $haystack."<br>";
foreach ($s as $needle) {
$needle = str_replace("*","",$needle);
echo $haystack "- ".$needle.": ".startsWith($haystack, $needle)."<br>";
}
}
function startsWith($haystack, $needle) {
$length = strlen($needle);
return (substr($haystack, 0, $length) === $needle);
}
To improve performance it might be a good idea to sort both arrays first and to add an exit clause in the inner foreach loop.
By the way, the startWith-function is from this great solution in SO: startsWith() and endsWith() functions in PHP

Another option would to be use preg_grep in a loop:
$masks = array('1224*', '543*', '321*' ...);
$data = array('123456789', '123456788', '987654321' ....);
$matches = array();
foreach($masks as $mask) {
$mask = substr($mask, 0, strlen($masks) - 2); // strip off trailing *
$matches[$mask] = preg_grep("/^$mask/", $data);
}
No idea how efficient this would be, just offering it up as an alternative.

Although regex is not famous for being fast, I'd like to know how well preg_grep() can perform if the pattern is boiled down to its leanest form and only called once (not in a loop).
By removing longer masks which are covered by shorter masks, the pattern will be greatly reduced. How much will the reduction be? of course, I cannot say for sure, but with 17,000 masks, there are sure to be a fair amount of redundancy.
Code: (Demo)
$masks = ['1224*', '543*', '321*', '12245*', '5*', '122488*'];
sort($masks);
$needle = rtrim(array_shift($masks), '*');
$keep[] = $needle;
foreach ($masks as $mask) {
if (strpos($mask, $needle) !== 0) {
$needle = rtrim($mask, '*');
$keep[] = $needle;
}
}
// now $keep only contains: ['1224', '321', '5']
$numbers = ['122456789', '123456788', '321876543234567', '55555555555555555', '987654321'];
var_export(
preg_grep('~^(?:' . implode('|', $keep) . ')~', $numbers)
);
Output:
array (
0 => '122456789',
2 => '321876543234567',
3 => '55555555555555555',
)

Check out the PHP function array_intersect_key.

Related

php preg match. Find elements with two values not in order

I got array like:
$array = array(
3A32,
4565,
7890,
0012,
A324,
9002,
3200,
345A,
0436
);
Then I need to find which elements has two numbers. The value of number can change.
If values were:
$n1 = 0;
$n2 = 3;
For that search preg_match() should return (3200,0436)
If values were:
$n1 = 0;
$n2 = 0;
preg_match() should return (0012,3200,9002)
Any idea?
Thanks.
I got your logic after looking multiple times on your input array as well as output based on given numbers.
Since i am not good in regular expression at all, i will go to find out answer with commonly know PHP functions.
1.Create a function which takes initial array as well as those search numbers in array form (so that you can search any number and any length of numbers).
2.Now iterate over initial array, split each value to convert to array and do array_count_value() for both split array and numbers array. now apply check and see exact match found or not?
3.Assign this match to a new array declared under the function itself.
4.Return this array at the end of function.
$n1 = 0;
$n2 = 0;
function checkValues($array,$numbers=array()){
$finalArray = [];
if(!empty($numbers)){
foreach($array as $arr){
$splitArr = str_split($arr);
$matched = true;
$count_number_Array = array_count_values($numbers);
$count_split_array = array_count_values($splitArr);
foreach($count_number_Array as $key=>$value){
if(!isset($count_split_array[$key]) || $count_split_array[$key] < $value){
$matched = false;
break;
}
}
if($matched === true){
$finalArray[] = $arr;
}
}
}
return $finalArray;
}
print_r(checkValues($array, array($n1,$n2)));
Output: https://3v4l.org/7uWfC And https://3v4l.org/Tuu5m And https://3v4l.org/fEKTO
Instead of using preg_match, you might use preg_grep and dynamically create a pattern that will match the 2 values in each order using an alternation.
^[A-Z0-9]*0[A-Z0-9]*3[A-Z0-9]*|[A-Z0-9]*3[A-Z0-9]*0[A-Z0-9]*$
The character class [A-Z0-9] matches either a char A-Z or a digit 0-9.
Regex demo | Php demo
If you want to use other characters, you could also take a look at preg_quote to handle regular expression characters.
function getElementWithTwoValues($n1, $n2) {
$pattern = "/^[A-Z0-9]*{$n1}[A-Z0-9]*{$n2}[A-Z0-9]*|[A-Z0-9]*{$n2}[A-Z0-9]*{$n1}[A-Z0-9]*$/";
$array = array(
"3A32",
"4565",
"7890",
"0012",
"A324",
"9002",
"3200",
"345A",
"0436"
);
return preg_grep($pattern, $array);
}
print_r(getElementWithTwoValues(0, 3));
print_r(getElementWithTwoValues(0, 0));
Output
Array
(
[6] => 3200
[8] => 0436
)
Array
(
[3] => 0012
[5] => 9002
[6] => 3200
)

PHP sort array by number in filename

I am working on a photo gallery that automatically sorts the photos based on the numbers of the file name.
I have the following code:
//calculate and sort
$totaal = 0;
if($handle_thumbs = opendir('thumbs')){
$files_thumbs = array();
while(false !== ($file = readdir($handle_thumbs))){
if($file != "." && $file != ".."){
$files_thumbs[] = $file;
$totaal++;
}
}
closedir($handle_thumbs);
}
sort($files_thumbs);
//reset array list
$first = reset($files_thumbs);
$last = end($files_thumbs);
//match and split filenames from array values - image numbers
preg_match("/(\d+(?:-\d+)*)/", "$first", $matches);
$firstimage = $matches[1];
preg_match("/(\d+(?:-\d+)*)/", "$last", $matches);
$lastimage = $matches[1];
But when i have file names like photo-Aname_0333.jpg, photo-Bname_0222.jpg, it does start with the photo-Aname_0333 instead of the 0222.
How can i sort this by the filename numbers?
None of the earlier answers are using the most appropriate/modern technique to perform the 3-way comparison -- the spaceship operator (<=>).
Not only does it provide a tidier syntax, it also allows you to implement multiple sorting rules in a single step.
The following snippet break each filename string in half (on the underscore), then compare the 2nd half of both filenames first, and if there is a tie on the 2nd halves then it will compare the 1st half of the two filenames.
Code: (Demo)
$photos = [
'photo-Bname_0333.jpg',
'photo-Bname_0222.jpg',
'photo-Aname_0333.jpg',
'photo-Cname_0111.jpg',
'photo-Cname_0222.jpg',
'photo-Aname_0112.jpg',
];
usort($photos, function ($a, $b) {
return array_reverse(explode('_', $a, 2)) <=> array_reverse(explode('_', $b, 2));
});
var_export($photos);
Output:
array (
0 => 'photo-Cname_0111.jpg',
1 => 'photo-Aname_0112.jpg',
2 => 'photo-Bname_0222.jpg',
3 => 'photo-Cname_0222.jpg',
4 => 'photo-Aname_0333.jpg',
5 => 'photo-Bname_0333.jpg',
)
For anyone who still thinks the preg_ calls are better, I will explain that my snippet is making potentially two comparisons and the preg_ solutions are only making one.
If you wish to only use one sorting criteria, then this non-regex technique will outperform regex:
usort($photos, function ($a, $b) {
return strstr($a, '_') <=> strstr($b, '_');
});
I super-love regex, but I know only to use it when non-regex techniques fail to provide a valuable advantage.
Older and wiser me says, simply remove the leading portion of the string before sorting (use SORT_NATURAL if needed), then sort the whole array. If you are scared of regex, then make mapped calls of strtok() on the underscore.
Code: (Demo)
array_multisort(preg_replace('/.*_/', '', $photos), $photos);
usort is a php function to sort array using values.
usort needs a callback function that receives 2 values.
In the callback, depending of your needs, you will be return the result of the comparision 1, 0 or -1. For example to sort the array asc, I return -1 when the firts value of the callback is less than second value.
In this particular case I obtain the numbers of the filename, and compare it as string, is not necesary to cast as integer.
<?php
$photos=[
'photo-Bname_0222.jpg',
'photo-Aname_0333.jpg',
'photo-Cname_0111.jpg',
];
usort($photos, function ($a, $b) {
preg_match("/(\d+(?:-\d+)*)/", $a, $matches);
$firstimage = $matches[1];
preg_match("/(\d+(?:-\d+)*)/", $b, $matches);
$lastimage = $matches[1];
if ($firstimage == $lastimage) {
return 0;
}
return ($firstimage < $lastimage) ? -1 : 1;
});
print_r($photos);
It sorts alphabetically because you use sort() on the filename. The 2nd part of your code does nothing.
You might want to take a look at usort http://php.net/manual/en/function.usort.php
You can do something like
function cmp($a, $b) {
if ($a == $b) {
return 0;
}
preg_match('/(\d+)\.\w+$/', $a, $matches);
$nrA = $matches[1];
preg_match('/(\d+)\.\w+$/', $b, $matches);
$nrB = $matches[1];
return ($nrA < $nrB) ? -1 : 1;
}
usort($files_thumb, 'cmp');
Also, I'm not sure about your regex, consider a file named "abc1234cde2345xx". The one I used takes the last digits before a file extension at the end. But it all depends on your filenames.
sort(array,sortingtype) , you have to set the second parameter of the sort() function to 1 so it will sort items numerically
//calculate and sort
$totaal = 0;
if($handle_thumbs = opendir('thumbs')){
$files_thumbs = array();
while(false !== ($file = readdir($handle_thumbs))){
if($file != "." && $file != ".."){
$files_thumbs[] = $file;
$totaal++;
}
}
closedir($handle_thumbs);
}
sort($files_thumbs,1);
//reset array list
$first = reset($files_thumbs);
$last = end($files_thumbs);
//match and split filenames from array values - image numbers
preg_match("/(\d+(?:-\d+)*)/", "$first", $matches);
$firstimage = $matches[1];
preg_match("/(\d+(?:-\d+)*)/", "$last", $matches);
$lastimage = $matches[1];

PHP: Detecting a particular sequence of elements in an array

How do I detect if a certan sequence of elements is present in an array? E.g. if I have the arrays and needle
$needle = array(1,1);
$haystack1 = array(0,1,0,0,0,1,1,0,1,0);
$haystack2 = array(0,0,0,0,1,0,1,0,0,1);
How does one detect if the subset $needle is present in e.g. $haystack1? This method should return TRUE for $haystack1 and FALSE for $haystack2.
Thanks for any suggestions!
Join the arrays, and check for the strpos of the needle.
if ( strpos( join($haystack1), join($needle) ) >= 0 ) {
echo "Items found";
}
Demo: http://codepad.org/F13DLWOI
Warning
This will not work for complicated items like objects or arrays within the haystack array. This method is best used with itesm like numbers and strings.
If they're always a single digit/character then you can convert all elements to strings, join by '', and use the regex or string functions to search.
For the specific case where no array element is a prefix of any other element (when both are converted to strings) then the already posted answers will work fine and probably be pretty fast.
Here's an approach that will work correctly in the general case:
function subarray_exists(array $needle, array $haystack) {
if (count($needle) > count($haystack)) {
return false;
}
$needle = array_values($needle);
$iterations = count($haystack) - count($needle) + 1;
for ($i = 0; $i < $iterations; ++$i) {
if (array_slice($haystack, $i, count($needle)) == $needle) {
return true;
}
}
return false;
}
See it in action.
Disclaimer: There are ways to write this function that I expect will make it execute much faster when you are searching huge haystacks, but for a first approach simple is good.

PHP: How do I check the contents of an array for my string?

I a string that is coming from my database table say $needle.
If te needle is not in my array, then I want to add it to my array.
If it IS in my array then so long as it is in only twice, then I still
want to add it to my array (so three times will be the maximum)
In order to check to see is if $needle is in my $haystack array, do I
need to loop through the array with strpos() or is there a quicker method ?
There are many needles in the table so I start by looping through
the select result.
This is the schematic of what I am trying to do...
$haystack = array();
while( $row = mysql_fetch_assoc($result)) {
$needle = $row['data'];
$num = no. of times $needle is in $haystack // $haystack is an array
if ($num < 3 ) {
$$haystack[] = $needle; // hopfully this adds the needle
}
} // end while. Get next needle.
Does anyone know how do I do this bit:
$num = no. of times $needle is in $haystack
thanks
You can use array_count_values() to first generate a map containing the frequency for each value, and then only increment the value if the value count in the map was < 3, for instance:
$original_values_count = array_count_values($values);
foreach ($values as $value)
if ($original_values_count[$value] < 3)
$values[] = $value;
As looping cannot be completely avoided, I'd say it's a good idea to opt for using a native PHP function in terms of speed, compared to looping all values manually.
Did you mean array_count_values() to return the occurrences of all the unique values?
<?php
$a=array("Cat","Dog","Horse","Dog");
print_r(array_count_values($a));
?>
The output of the code above will be:
Array (
[Cat] => 1,
[Dog] => 2,
[Horse] => 1
)
There is also array_map() function, which applies given function to every element of array.
Maybe something like the following? Just changing Miek's code a little.
$haystack_count = array_count_values($haystack);
if ($haystack_count[$needle] < 3)
$haystack[] = $needle;

Searching an array of different strings inside a single string in PHP

I have an array of strings that I want to try and match to the end of a normal string. I'm not sure the best way to do this in PHP.
This is sorta what I am trying to do:
Example:
Input: abcde
Search array: er, wr, de
Match: de
My first thought was to write a loop that goes through the array and crafts a regular expression by adding "\b" on the end of each string and then check if it is found in the input string. While this would work it seems sorta inefficient to loop through the entire array. I've been told regular expressions are slow in PHP and don't want to implement something that will take me down the wrong path.
Is there a better way to see if one of the strings in my array occurs at the end of the input string?
The preg_filter() function looks like it might do the job but is for PHP 5.3+ and I am still sticking with 5.2.11 stable.
For something this simple, you don't need a regex. You can either loop over the array, and use strpos to see if the index is length(input) - length(test). If each entry in the search array is always of a constant length, you can also speed things up by chopping the end off the input, then comparing that to each item in the array.
You can't avoid going through the whole array, as in the worst general case, the item that matches will be at the end of the array. However, unless the array is huge, I wouldn't worry too much about performance - it will be much faster than you think.
Though compiling the regular expression takes some time I wouldn't dismiss using pcre so easily. Unless you find a compare function that takes several needles you need a loop for the needles and executing the loop + calling the compare function for each single needle takes time, too.
Let's take a test script that fetches all the function names from php.net and looks for certain endings. This was only an adhoc script but I suppose no matter which strcmp-ish function + loop you use it will be slower than the simple pcre pattern (in this case).
count($hs)=5549
pcre: 4.377925157547 s
substr_compare: 7.951938867569 s
identical results: bool(true)
This was the result when search for nine different patterns. If there were only two ('yadda', 'ge') both methods took the same time.
Feel free to criticize the test script (aren't there always errors in synthetic tests that are obvious for everyone but oneself? ;-) )
<?php
/* get the test data
All the function names from php.net
*/
$doc = new DOMDocument;
$doc->loadhtmlfile('http://docs.php.net/quickref.php');
$xpath = new DOMXPath($doc);
$hs = array();
foreach( $xpath->query('//a') as $a ) {
$hs[] = $a->textContent;
}
echo 'count($hs)=', count($hs), "\n";
// should find:
// ge, e.g. imagick_adaptiveblurimage
// ing, e.g. m_setblocking
// name, e.g. basename
// ions, e.g. assert_options
$ns = array('yadda', 'ge', 'foo', 'ing', 'bar', 'name', 'abcd', 'ions', 'baz');
sleep(1);
/* test 1: pcre */
$start = microtime(true);
for($run=0; $run<100; $run++) {
$matchesA = array();
$pattern = '/(?:' . join('|', $ns) . ')$/';
foreach($hs as $haystack) {
if ( preg_match($pattern, $haystack, $m) ) {
#$matchesA[$m[0]]+= 1;
}
}
}
echo "pcre: ", microtime(true)-$start, " s\n";
flush();
sleep(1);
/* test 2: loop + substr_compare */
$start = microtime(true);
for($run=0; $run<100; $run++) {
$matchesB = array();
foreach( $hs as $haystack ) {
$hlen = strlen($haystack);
foreach( $ns as $needle ) {
$nlen = strlen($needle);
if ( $hlen >= $nlen && 0===substr_compare($haystack, $needle, -$nlen) ) {
#$matchesB[$needle]+= 1;
}
}
}
}
echo "substr_compare: ", microtime(true)-$start, " s\n";
echo 'identical results: '; var_dump($matchesA===$matchesB);
I might approach this backwards;
if your string-ending list is fixed or varies rarely,
I would start by preprocessing it to make it easy to match against,
then grab the end of your string and see if it matches!
Sample code:
<?php
// Test whether string ends in predetermined list of suffixes
// Input: string to test
// Output: if matching suffix found, returns suffix as string, else boolean false
function findMatch($str) {
$matchTo = array(
2 => array( 'ge' => true, 'de' => true ),
3 => array( 'foo' => true, 'bar' => true, 'baz' => true ),
4 => array( 'abcd' => true, 'efgh' => true )
);
foreach($matchTo as $length => $list) {
$end = substr($str, -$length);
if (isset($list[$end]))
return $end;
}
return $false;
}
?>
This might be an overkill but you can try the following.
Create a hash for each entry of your search array and store them as keys in the array (that will be your lookup array).
Then go from the end of your input string one character at time (e, de,cde and etc) and compute a hash on a substring at each iteration. If a hash is in your lookup array, you have much.

Categories