PHP SourceCode Random - php

The function for rand() should be something like (SEED * A + C) mod M.
How can I find the values of A, C, and M? And if I find those values, can I predict the next number in the sequence?
I know that I can find the values of these variables in the PHP source code. But after looking around I really cannot find them...
Does anybody know what file it would be in? Or who else I could contact (I've tried email internals#lists.php.net but haven't got a response)
Also I'm doing all this in PHP versions prior to 7, where rand() and mt_rand() became synonymous.
EDIT: I have seen Is it possible to predict rand(0,10) in PHP? but those answers aren't about the constant values in PHP's rand() value by themselves.
Thank you!

I believe that the old school rand() function used a linear congruential generator.
This generator is system dependent. One algorithm employed by glibc was:
next = next * 1103515245 + 12345;
return next & 0x7fffffff;
so there you have your constants. The state, of course, is the initial value of 'next', which is zero unless set differently by srand().
There are ways of attacking a linear congruence; one possibility - the slowest, but the easiest to explain - is to brute force it. Say that you four consecutive values: a0, a1, a2, a3 from your rand() implementation. You can check all values of seed that would yield that same sequence.
Note that if your a0 value is produced by, say, rand() % 7172, then your initial seed must obey the rule that "seed % 7172 === a0". This immediately reduces the space you need to brute force, speeding up operations proportionately. Also, you don't need to check all four numbers.
This would be the efficient equivalent of running (in PHP)
for ($seed = 0; $seed < MAX_SEED; $seed++) {
srand($seed);
if ($a0 !== [RAND() FORMULA]) return false;
if ($a1 !== [RAND() FORMULA]) return false;
if ($a2 !== [RAND() FORMULA]) return false;
if ($a3 !== [RAND() FORMULA]) return false;
return true;
}
Experiments
By checking with a reference trivial C source code
#include <stdio.h>
int main() {
srand(1);
printf("%ld\n", rand());
}
I determined that PHP and C do indeed share the same underlying function (I tabulated different values for srand()).
I also found out that srand(0) and srand(1) yield the same result, which isn't consistent with my linear model.
And that's because glibc rand() is not so trivial a linear congruential generator. More info here. Actually it is quoted in a SO answer and the code I had was for the old, TYPE_0 generator.

Related

PHP built in functions complexity (isAnagramOfPalindrome function)

I've been googling for the past 2 hours, and I cannot find a list of php built in functions time and space complexity. I have the isAnagramOfPalindrome problem to solve with the following maximum allowed complexity:
expected worst-case time complexity is O(N)
expected worst-case space complexity is O(1) (not counting the storage required for input arguments).
where N is the input string length. Here is my simplest solution, but I don't know if it is within the complexity limits.
class Solution {
// Function to determine if the input string can make a palindrome by rearranging it
static public function isAnagramOfPalindrome($S) {
// here I am counting how many characters have odd number of occurrences
$odds = count(array_filter(count_chars($S, 1), function($var) {
return($var & 1);
}));
// If the string length is odd, then a palindrome would have 1 character with odd number occurrences
// If the string length is even, all characters should have even number of occurrences
return (int)($odds == (strlen($S) & 1));
}
}
echo Solution :: isAnagramOfPalindrome($_POST['input']);
Anyone have an idea where to find this kind of information?
EDIT
I found out that array_filter has O(N) complexity, and count has O(1) complexity. Now I need to find info on count_chars, but a full list would be very convenient for future porblems.
EDIT 2
After some research on space and time complexity in general, I found out that this code has O(N) time complexity and O(1) space complexity because:
The count_chars will loop N times (full length of the input string, giving it a start complexity of O(N) ). This is generating an array with limited maximum number of fields (26 precisely, the number of different characters), and then it is applying a filter on this array, which means the filter will loop 26 times at most. When pushing the input length towards infinity, this loop is insignificant and it is seen as a constant. Count also applies to this generated constant array, and besides, it is insignificant because the count function complexity is O(1). Hence, the time complexity of the algorithm is O(N).
It goes the same with space complexity. When calculating space complexity, we do not count the input, only the objects generated in the process. These objects are the 26-elements array and the count variable, and both are treated as constants because their size cannot increase over this point, not matter how big the input is. So we can say that the algorithm has a space complexity of O(1).
Anyway, that list would be still valuable so we do not have to look inside the php source code. :)
A probable reason for not including this information is that is is likely to change per release, as improvements are made / optimizations for a general case.
PHP is built on C, Some of the functions are simply wrappers around the c counterparts, for example hypot a google search, a look at man hypot, in the docs for he math lib
http://www.gnu.org/software/libc/manual/html_node/Exponents-and-Logarithms.html#Exponents-and-Logarithms
The source actually provides no better info
https://github.com/lattera/glibc/blob/a2f34833b1042d5d8eeb263b4cf4caaea138c4ad/math/w_hypot.c (Not official, Just easy to link to)
Not to mention, This is only glibc, Windows will have a different implementation. So there MAY even be a different big O per OS that PHP is compiled on
Another reason could be because it would confuse most developers.
Most developers I know would simply choose a function with the "best" big O
a maximum doesnt always mean its slower
http://www.sorting-algorithms.com/
Has a good visual prop of whats happening with some functions, ie bubble sort is a "slow" sort, Yet its one of the fastest for nearly sorted data.
Quick sort is what many will use, which is actually very slow for nearly sorted data.
Big O is worst case - PHP may decide between a release that they should optimize for a certain condition and that will change the big O of the function and theres no easy way to document that.
There is a partial list here (which I guess you have seen)
List of Big-O for PHP functions
Which does list some of the more common PHP functions.
For this particular example....
Its fairly easy to solve without using the built in functions.
Example code
function isPalAnagram($string) {
$string = str_replace(" ", "", $string);
$len = strlen($string);
$oddCount = $len & 1;
$string = str_split($string);
while ($len > 0 && $oddCount >= 0) {
$current = reset($string);
$replace_count = 0;
foreach($string as $key => &$char) {
if ($char === $current){
unset($string[$key]);
$len--;
$replace_count++;
continue;
}
}
$oddCount -= ($replace_count & 1);
}
return ($len - $oddCount) === 0;
}
Using the fact that there can not be more than 1 odd count, you can return early from the array.
I think mine is also O(N) time because its worst case is O(N) as far as I can tell.
Test
$a = microtime(true);
for($i=1; $i<100000; $i++) {
testMethod("the quick brown fox jumped over the lazy dog");
testMethod("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");
testMethod("testest");
}
printf("Took %s seconds, %s memory", microtime(true) - $a, memory_get_peak_usage(true));
Tests run using really old hardware
My way
Took 64.125452041626 seconds, 262144 memory
Your way
Took 112.96145009995 seconds, 262144 memory
I'm fairly sure that my way is not the quickest way either.
I actually cant see much info either for languages other than PHP (Java for example).
I know a lot of this post is speculating about why its not there and theres not a lot drawing from credible sources, I hope its an partially explained why big O isnt listed in the documentation page though

How do I go about converting a math equation into php?

I am not so good at maths and I'm looking to transfer 3 math equations to php functions.
I've tried looking up how to individually do each part of the equation in php but I keep getting strange results so I must be doing something wrong.
Is there a php function for exponential growth?
The image with the equations are here:
http://i.imgur.com/zIhMEEu.jpg
Thanks
For the second equation this is what I have:
$rank = 50;
$xp = log(24000 * (2^($rank/6) - 1));
echo $xp;
The number is too small for this to be correct. I'm also not sure how to convert the 'ln 2' into PHP. The log() function seemed to come up under 'natural logarithm to php' search.
There are various functions that need to be combined in order to create these equations. The log function performs logarithm operations in a base of your choice (or ln if you do not provide a base). The pow function performs exponentiation.
Your equations would be:
function rank($xp) {
return floor(6 * log((xp * log(2) / 24000 + 1), 2));
}
function xp($rank) {
return floor(24000 * (pow(2, (rank / 6)) - 1) / log(2));
}
function kills($rank) {
return floor(xp($rank) / 200);
}
There are a few more parentheses there than absolutely needed, for clarity's sake.
Mathematical notations in general are considerably more compact and expressive than most programming languages (not just PHP) due to the fact that you can use any symbol you can think of to represent various concepts. In programming, you're stuck calling functions.
Also, I'm not sure what the various hardcoded numbers represent, or if it makes sense to change them, in the context of the formula, but you might want to think about setting them up as extra parameters to the function. For example:
function kills($rank, $killsPerXp = 200) {
return floor(xp($rank) / $killsPerXp);
}
This adds clarity to the code, because it lets you know what the numbers represent. At the same time, it allows you to change the numbers more easily in case you are using them in multiple places.

Php rand vs Perl rand

I am trying to port a piece of code from perl to php. The perl code snippet is part of akamai's video on demand link generation script. The script generates seed based on the location / URL of the video file (which will always be constant for a single URL). And then it is used in generating serial ID for stream (which is basically a random number between 1 and 2000 using the seed). Here is the perl code.$seed=6718;
srand($seed);
print(int(rand(1999)) + 1); // return 442 every time And the converted PHP code is:$seed=6718;
srand($seed);
echo(rand(0, 1999) + 1); //returns 155 every time
Does php rand behaves differently than perl one?
Yes. You can't depend on their algorithms being the same. For perl, which rand is used depends on what platform your perl was built for.
You may have more luck using a particular algorithm; for instance, Mersenne Twister looks to be available for both PHP and Perl.
Update: trying it produces different results, so that one at least won't do the trick.
Update 2: From the perl numbers you show, your perl is using the drand48 library; I don't know whether that's available for PHP at all, and google isn't helping.
[clippy]It looks like your trying to hash a number, maybe you want to use a hash function?[/clippy]
Hash functions are designed to take an input and produce a consistently repeatable value, that is in appearance random. As a bonus they often have cross language implementations.
Using srand() with rand() to get what is basically a hash value is a fairly bad idea. Different languages use different algorithms, some just use system libraries. Changing (or upgrading) the OS, standard C library, or language can result in wildly different results.
Using SHA1 to get a number between 1 and 2000 is a bit overkill, but you can at least be sure that you could port the code to nearly any language and still get the same result.
use Digest::SHA1;
# get a integer hash value from $in between $min (inclusive) and $max (exclusive)
sub get_int_hash {
my ($in, $min, $max) = #_;
# calculate the SHA1 of $in, note $in is converted to a string.
my $sha = Digest::SHA1->new;
$sha->add( "$in" );
my $digest = $sha->hexdigest;
# use the last 7 characters of the digest (28 bits) for an effective range of 0 - 268,435,455.
my $value = hex substr $digest, -7;
# scale and shift the value to the desired range.
my $out = int( $value / 0x10000000 * ( $max - $min ) ) + $min;
return $out;
}
print get_int_hash(6718, 1, 2000); #this should print 812 for any SHA1 implementation.
Just seeing this snippet of code it is impossible to say if it is the same.
At first you need to knew that even a random generator like the rand() function is not really random. It calculates a new value with a mathematical formula from the previous number. With the srand() function you can set the start value.
Calling srand() with the same argument each time means that the program always returns the same numbers in the same order.
If you really want random numbers, in Perl you should remove the initialization of srand(). Because Perl automatically sets srand() to a better (random) value when you first call the rand() function.
If your program really wants random numbers, then it should also be okay for PHP. But even in PHP i would look if srand() is automatically set and set to a more random value.
If your program don't work with random numbers and instead really want a stream of numbers that is always the same, then the snipet of code are probably not identical. Even if you do the same initialization with srand() it could be that PHP uses another formula to calculate the next "random" number.
So you need to look at your surrounding code if you code really wants random numbers, if yes you can use this code. But even then you should look for a better initialization for srand().

How to compare two 64 bit numbers

In PHP I have a 64 bit number which represents tasks that must be completed. A second 64 bit number represents the tasks which have been completed:
$pack_code = 1001111100100000000000000011111101001111100100000000000000011111
$veri_code = 0000000000000000000000000001110000000000000000000000000000111110
I need to compare the two and provide a percentage of tasks completed figure. I could loop through both and find how many bits are set, but I don't know if this is the fastest way?
Assuming that these are actually strings, perhaps something like:
$pack_code = '1001111100100000000000000011111101001111100100000000000000011111';
$veri_code = '0000000000000000000000000001110000000000000000000000000000111110';
$matches = array_intersect_assoc(str_split($pack_code),str_split($veri_code));
$finished_matches = array_intersect($matches,array(1));
$percentage = (count($finished_matches) / 64) * 100
Because you're getting the numbers as hex strings instead of ones and zeros, you'll need to do a bit of extra work.
PHP does not reliably support numbers over 32 bits as integers. 64-bit support requires being compiled and running on a 64-bit machine. This means that attempts to represent a 64-bit integer may fail depending on your environment. For this reason, it will be important to ensure that PHP only ever deals with these numbers as strings. This won't be hard, as hex strings coming out of the database will be, well, strings, not ints.
There are a few options here. The first would be using the GMP extension's gmp_xor function, which performs a bitwise-XOR operation on two numbers. The resulting number will have bits turned on when the two numbers have opposing bits in that location, and off when the two numbers have identical bits in that location. Then it's just a matter of counting the bits to get the remaining task count.
Another option would be transforming the number-as-a-string into a string of ones and zeros, as you've represented in your question. If you have GMP, you can use gmp_init to read it as a base-16 number, and use gmp_strval to return it as a base-2 number.
If you don't have GMP, this function provided in another answer (scroll to "Step 2") can accurately transform a string-as-number into anything between base-2 and 36. It will be slower than using GMP.
In both of these cases, you'd end up with a string of ones and zeros and can use code like that posted by #Mark Baker to get the difference.
Optimization in this case is not worth of considering. I'm 100% sure that you don't really care whether your scrip will be generated 0.00000014 sec. faster, am I right?
Just loop through each bit of that number, compare it with another and you're done.
Remember words of Donald Knuth:
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
This code utilizes the GNU Multi Precision library, which is supported by PHP, and since it is implemented in C, should be fast enough, and supports arbitrary precision.
$pack_code = gmp_init("1001111100100000000000000011111101001111100100000000000000011111", 2);
$veri_code = gmp_init("0000000000000000000000000001110000000000000000000000000000111110", 2);
$number_of_different_bits = gmp_popcount(gmp_xor($pack_code, $veri_code));
$a = 11111;
echo sprintf('%032b',$a)."\n";
$b = 12345;
echo sprintf('%032b',$b)."\n";
$c = $a & $b;
echo sprintf('%032b',$c)."\n";
$n=0;
while($c)
{
$n += $c & 1;
$c = $c >> 1;
}
echo $n."\n";
Output:
00000000000000000010101101100111
00000000000000000011000000111001
00000000000000000010000000100001
3
Given your PHP-setuo can handle 64bit, this can be easily extended.
If not you can sidestep this restriction using GNU Multiple Precision
You could also split up the HEx-Representation and then operate on those coresponding parts parts instead. As you need just the local fact of 1 or 0 and not which number actually is represented! I think that would solve your problem best.
For example:
0xF1A35C and 0xD546C1
you just compare the binary version of F and D, 1 and 5, A and 4, ...

How to determine if a list is subset of another list?

What is efficient way to determine if a list is a subset of another list?
Example:
is_subset(List(1,2,3,4),List(2,3)) //Returns true
is_subset(List(1,2,3,4),List(3,4,5)) //Returns false
I am mostly looking for efficient algorithm and not too concern how the list is stored. It can be stored in array, link list or other data structure.
Thanks
EDIT: The list is sorted
Here are a few trade offs you can make. Let's assume that you have two sets of elements, S and T, drawn from a universe U. We want to determine if S≥T. In one of the given examples, we have
S={1,2,3,4}
T={3,4,5}
U={1,2,3,4,5}
1. Sorted Lists (or balanced search tree)
The method suggested by most posters. If you already have sorted lists, or don't care about the length of time it takes to create them (say, you're not doing that often), then this algorithm is basically linear time and space. This is usually the best option.
(To be fair to other choices here, the time and space bounds should actually contain factors of "Log |U|" in appropriate places, but this is usually not relivant)
Data structures: Sorted list for each of S and T. Or a balanced search tree (e.g. AVL tree, red-black tree, B+-tree) that can be iterated over in constant space.
Algorithm: For each element in T, in order, search S linearly for that element. Remember where you left off each search, and start the next search there. If every search succeeds, then S≥T.
Time complexity: about O( |S| Log|S| + |T| Log|T| ) to create the sorted lists, O( max(|S|, |T|) ) to compare.
Space complexity: about O( |S| + |T| )
Example (C++)
#include <set>
#include <algorithm>
std::set<int> create_S()
{
std::set<int> S;
// note: std::set will put these in order internally
S.insert(3);
S.insert(2);
S.insert(4);
S.insert(1);
return S;
}
std::set<int> create_T()
{
std::set<int> T;
// note std::set will put these in order internally
T.insert(4);
T.insert(3);
T.insert(5);
return T;
}
int main()
{
std::set<int> S=create_S();
std::set<int> T=create_T();
return std::includes(S.begin(),S.end(), T.begin(), T.end());
}
2. Hash tables
Better average time complexity than with a sorted list can be obtained using hash tables. The improved behavior for large sets comes at the cost of generally poorer performance for small sets.
As with sorted lists, I'm ignoring the complexity contributed by the size of the universe.
Data structure: Hash table for S, anything quickly iterable for T.
Algorithm: Insert each element of S into its hashtable. Then, for each element in T, check to see if it's in the hash table.
Time complexity: O( |S| + |T| ) to set up, O( |T| ) to compare.
Space complexity: O( |S| + |T| )
Example (C++)
#include <tr1/unordered_set>
std::tr1::unordered_set<int> create_S()
{
std::tr1::unordered_set<int> S;
S.insert(3);
S.insert(2);
S.insert(4);
S.insert(1);
return S;
}
std::tr1::unordered_set<int> create_T()
{
std::tr1::unordered_set<int> T;
T.insert(4);
T.insert(3);
T.insert(5);
return T;
}
bool includes(const std::tr1::unordered_set<int>& S,
const std::tr1::unordered_set<int>& T)
{
for (std::tr1::unordered_set<int>::const_iterator iter=T.begin();
iter!=T.end();
++iter)
{
if (S.find(*iter)==S.end())
{
return false;
}
}
return true;
}
int main()
{
std::tr1::unordered_set<int> S=create_S();
std::tr1::unordered_set<int> T=create_T();
return includes(S,T);
}
3. Bit sets
If your universe is particularly small (let's say you can only have elements 0-32), then a bitset is a reasonable solution. The running time (again, assuming you don't care about setup time) is essentially constant. In the case you do care about setup, it's still faster than creating a sorted list.
Unfortunately, bitsets become unwieldy very quickly for even a moderately sized universe.
Data structure: bit vector (usually a machine integer) for each of S and T. We might encode S=11110 and T=00111, in the given example.
Algorithm: Calculate the intersection, by computing the bitwise 'and' of each bit in S with the corresponding bit in T. If the result equals T, then S≥T.
Time complexity: O( |U| + |S| + |T| ) to setup, O( |U| ) to compare.
Space complexity: O( |U| )
Example: (C++)
#include <bitset>
// bitset universe always starts at 0, so create size 6 bitsets for demonstration.
// U={0,1,2,3,4,5}
std::bitset<6> create_S()
{
std::bitset<6> S;
// Note: bitsets don't care about order
S.set(3);
S.set(2);
S.set(4);
S.set(1);
return S;
}
std::bitset<6> create_T()
{
std::bitset<6> T;
// Note: bitsets don't care about order
T.set(4);
T.set(3);
T.set(5);
return T;
}
int main()
{
std::bitset<6> S=create_S();
std::bitset<6> T=create_T();
return S & T == T;
}
4. Bloom filters
All the speed benefits of bitsets, without the pesky limitation on universe size the bitsets have. Only one down side: they sometimes (often, if you're not careful) give the wrong answer: If the algorithm says "no", then you definitely don't have inclusion. If the algorithm says "yes", you might or might not. Better accuracy is attained by choosing a large filter size, and good hash functions.
Given that they can and will give wrong answers, Bloom filters might sound like a horrible idea. However, they have definite uses. Generally one would use Bloom filters to do many inclusion checks quickly, and then use a slower deterministic method to guarantee correctness when needed. The linked Wikipedia article mentions some applications using Bloom filters.
Data structure: A Bloom filter is a fancy bitset. Must choose a filter size, and hash functions beforehand.
Algorithm (sketch): Initialize the bitset to 0. To add an element to a bloom filter, hash it with each hash function, and set the corresponding bit in the bitset. Determining inclusion works just as for bitsets.
Time complexity: O( filter size )
Space complexity: O( filter size )
Probability of correctness: Always correct if it answers for "S does not include T". Something like 0.6185^(|S|x|T|/(filter size))) if it answers "S includes T". In particular, the filter size must be chosen proportional to the product of |S| and |T| to give reasonable probability of accuracy.
For C++, the best way is to use std::includes algorithm:
#include <algorithm>
std::list<int> l1, l2;
...
// Test whether l2 is a subset of l1
bool is_subset = std::includes(l1.begin(), l1.end(), l2.begin(), l2.end());
This requires both lists to be sorted, as specified in your question. Complexity is linear.
Just wanted to mention that Python has a method for this:
return set(list2).issubset(list1)
Or:
return set(list2) <= set(list1)
If both lists are ordered, one simple solution would be to simultaneously go over both lists (with a two bump pointers in both lists), and verify that all of the elements in the second list appear in the first list (until all elements are found, or until you reach a larger number in the first list).
A pseudo-code in C++ would look something like this:
List l1, l2;
iterator i1 = l1.start();
iterator i2 = l2.start();
while(i1 != l1.end() && i2 != l2.end()) {
if (*i1 == *i2) {
i1++;
i2++;
} else if (*i1 > *i2) {
return false;
} else {
i1++;
}
}
return true;
(It obviously won't work as is, but the idea should be clear).
If the lists are not ordered, you can use a hashtable - insert all of your elements in the first list, and then check if all of the elements in the second list appear in the hashtable.
These are algorithmic answers. In different languages, there are default built-in methods to check this.
If you're concerned about ordering or continuity, you may need to use the Boyer-Moore or
the Horspool algorithm.
The question is, do you want to consider [2, 1] to be a subset of [1, 2, 3]? Do you want [1, 3] to be considered a subset of [1, 2, 3]? If the answer is no to both of these, you might consider one of the algorithms linked above. Otherwise, you may want to consider a hash set.
Scala, assuming you mean subsequence by subset:
def is_subset[A,B](l1: List[A], l2: List[B]): Boolean =
(l1 indexOfSeq l2) > 0
Anyway, a subsequence is just a substring problem. Optimal algorithms include Knuth-Morris-Pratt and Boyer-Moore, and a few more complex ones.
If you truly meant subset, though, and thus you are speaking of Sets and not Lists, you can just use the subsetOf method in Scala. Algorithms will depend on how the set is stored. The following algorithm works for a list storage, which is a very suboptimal one.
def is_subset[A,B](l1: List[A], l2: List[B]): Boolean = (l1, l2) match {
case (_, Nil) => true
case (Nil, _) => false
case (h1 :: t1, h2 :: t2) if h1 == h2 => is_subset(t1, t2)
case (_ :: tail, list) => is_subset(tail, list)
}
For indexOfSeq in scala trunk I implemented KMP, which you can examine: SequenceTemplate
If you're ok with storing the data in a hashset you can simply check whether list1 contains x for each x in list2. Which will be close to O(n) in the size of list2. (Of course you can also do the same with other datastructures, but that will lead to different runtimes).
This depends highly on the language/toolkit, as well as the size and storage of the lists.
If the lists are sorted, a single loop can determine this. You can just start walking the larger list trying to find the first element of the smaller list (break if you pass it in value), then move on to the next, and continue from the current location. This is fast, since it's a one loop/one pass algorithm.
For unsorted lists, it's often fastest to build some form of hash table from the first list's elements, then search each element in the second list off the hash. This is the approach that many of the .NET LINQ extensions use internally for item searching within a list, and scale quite well (although they have fairly large temporary memory requirements).
func isSubset ( #list, #possibleSubsetList ) {
if ( size ( #possibleSubsetList ) > size ( #list ) ) {
return false;
}
for ( #list : $a ) {
if ( $a != #possibleSubsetList[0] ) {
next;
} else {
pop ( #possibleSubsetList );
}
}
if ( size ( #possibleSubsetList ) == 0 ) {
return true;
} else {
return false;
}
}
O(n) viola. of course, isSubset( (1,2,3,4,5), (2,4) ) will return true
You should have a look at the implementation of STL method search. That is the C++ way I think this would be done.
http://www.sgi.com/tech/stl/search.html
Description:
Search finds a subsequence within the range [first1, last1) that is identical to [first2, last2) when compared element-by-element.
You can see the problem to check if a list is a subset of another list as the same problem to verify if a substring belongs to a string. The best known algorithm for this is the KMP (Knuth-Morris-Pratt). Look at wikipedia for a pseudo-code or just use some String.contains method available in the language of your preference. =)
The efficient algorithm uses some kind of state machine, where you keep the accepting states in memory (in python):
def is_subset(l1, l2):
matches = []
for e in l1:
# increment
to_check = [0] + [i+1 for i in matches]
matches = [] # nothing matches
for i in to_check:
if l2[i] = e:
if i == len(l2)-1:
return True
matches.append(i)
return False
EDIT: of course if the list are sorted, you don't need that algorithm, just do:
def is_subset(l1, l2):
index = 0
for e in l1:
if e > l2[index]:
return False
elif e == l2[index]:
index += 1
else:
index == 0
if index == len(l2):
return True
return False

Categories