I've been wooking with some strings in php to make my own framework...
There's something that "bothers" me.
$var = "hello!";
$arr = array("h","e","l","l","o","!");
Can someone tell me which one ($var or $arr) uses more memory then the other one? And why?
At first sight I would say the array would use more memory since it has to position each character inside the array itself, but I'm not sure.
The array will use more memory than the string
A string and an array are zval structures in their own right, but each element in the array is a string as well, each with its own zval; arrays take a surprising amount of memory. There is also the fact that an array element comprises both a key and a value, each using memory
Take a read of this article to see just how much memory is used by an array structure
The array takes (a lot!) more memory.
A string in PHP is an object in memory that holds (for example) the length and a pointer to the actual data in memory. I think that on most platforms that gives you 32 bits for the length and 64 bits for the pointer. With 16-byte alignment requirements from some CPUs, this means that each string will be at least 32 bytes (descriptor + actual data) - even if it's only a single character.
The array from your example contains 6 strings. That will be 192 bytes plus the overhead of storing an array, which is not insignificant either (count on at least 128 more bytes).
Disclaimer: the numbers used in this answer are a rough approximation - expect far more overhead than mentioned here.
Related
I am trying to solve a CTF in which the juggling type should be used. The code is:
if ($_GET["hash"] == hash("ripemd160", $_GET["hash"]))
{
echo $flag;
}
else
{
echo "<h1>Bad Hash</h1>";
}
I made a script in python which checks random hashes in ripemd160 that begins with "0e" and ends with only numbers. The code is:
def id_generator(size, chars=string.digits):
return ''.join(random.choice(chars) for _ in range(size))
param = "0e"
results = []
while True:
h = hashlib.new('ripemd160')
h.update("{0}".format(str(param)).encode('utf-8'))
hashed = h.hexdigest()
if param not in results:
print(param)
if hashed.startswith("0e") and hashed[2:].isdigit():
print(param)
print(hashed)
break
results.append(param)
else:
print("CHECKED")
param = "0e" + str(id_generator(size=10))
Any suggestions on how to solve it? Thank you!
There seems to be a bit of misunderstanding in the comments, so I'll start by explaining the problem a little more:
Type juggling refers to the behaviour of PHP whereby variables are implicitly cast to different data types under certain conditions. For example, all the following logical expressions will evaluate to true in PHP:
0 == 0 // int vs. int
"0" == 0 // str -> int
"abc" == 0 // any non-numerical string -> 0
"1.234E+03" == "0.1234E+04" // string that looks like a float -> float
"0e215962017" == 0 // another string that looks like a float
The last of these examples is interesting because its MD5 hash value is another string consisting of 0e followed by a bunch of decimal digits (0e291242476940776845150308577824). So here's another logical expression in PHP that will evaluate to true:
"0e215962017" == md5("0e215962017")
To solve this CTF challenge, you have to find a string that is "equal" to its own hash value, but using the RIPEMD160 algorithm instead of MD5. When this is provided as a query string variable (e.g., ?hash=0e215962017), then the PHP script will disclose the value of a flag.
Fake hash collisions like this aren't difficult to find. Roughly 1 in every 256 MD5 hashes will start with '0e', and the probability that the remaining 30 characters are all digits is (10/16)^30. If you do the maths, you'll find that the probability of an MD5 hash equating to zero in PHP is approximately one in 340 million. It took me about a minute (almost 216 million attempts) to find the above example.
Exactly the same method can be used to find similar values that work with RIPEMD160. You just need to test more hashes, since the extra hash digits mean that the probability of a "collision" will be approximately one in 14.6 billion. Quite a lot, but still tractable (in fact, I found a solution to this challenge in about 15 minutes, but I'm not posting it here).
Your code, on the other hand, will take much, much longer to find a solution. First of all, there is absolutely no point in generating random inputs. Sequential values will work just as well, and will be much faster to generate.
If you use sequential input values, then you also won't need to worry about repeating the same hash calculations. Your code uses a list structure to store previously hashed values. This is a terrible idea. Searching for an item in a list is an O(n) operation, so once your code has (unsuccessfully) tested a billion inputs, it will have to compare every new input against each of these billion inputs at each iteration, causing your code to grind to a complete standstill. Your code would actually run a lot faster if you didn't bother checking for duplicates. When you have time, I suggest you learn when to use lists, dicts and sets in Python.
Another problem is that your code only tests 10-digit numbers, which means it can only test a maximum of 10 billion possible inputs. Based on the numbers given above, are you sure this is a sensible limit?
Finally, your code is printing every single input string before you calculate its hash. Before your program outputs a solution, you can expect it to print out somewhere in the order of a billion screenfuls of incorrect guesses. Is there any point in doing this? No.
Here's the code I used to find the MD5 collision I mentioned earlier. You can easily adapt it to work with RIPEMD160, and you can convert it to Python if you like (although the PHP code is much simpler):
$n = 0;
while (1) {
$s = "0e$n";
$h = md5($s);
if ($s == $h) break;
$n++;
}
echo "$s : $h\n";
Note: Use PHP's hash_equals() function and strict comparison operators to avoid this sort of vulnerability in your own code.
I am having a for loop which is iterating through a large array $result. Every array item $row is used to generate a new line of string $line. This string is concatenated into $str. To preserve memory, I am using unset($result[$i]) to clear every array item that has been processed in the original array. This looks like this:
$resultcount = count($result);
for($i=0; $i<$resultcount; ++$i){
$row = $result[$i];
$line = do_something($row);
$str.= '<tr><td>'.$line.'</td></tr>';
unset($result[$i]);
}
return $str;
This (more or less exotic piece of code) works unless the string exceeds a length of approx. 1'000'000 characters. In this case the return value is just empty. This is highly irritating because:
The calculation effort (in this example do_something()) is not a problem. By using echo count($result).' - '.strlen($str)."\n" I can see that the loop finishes properly. There is no web server or php error shown.
The memory_limit is also not a problem. I am working with more data on other parts of the application.
It appears that the problem lies in return $str itself. If I am using return substr($str, 0, 980000) then it just works fine. Further debugging shows that the string gets tainted as soon as it reaches the length of 999'775 bytes.
I can't put longer return value string into another string variable. But I am able to do a strlen() to get a proper result (1310307). So the return value string has a total length of 1'310'307 bytes. But I can't use them properly as a string.
Where does this limitation come from and how may I circumvent it?
You seem to describe return value in contradicting way:
In this case the return value is just empty.
but
But I am able to do a strlen() to get a proper result (1310307). So the return value string has a total length of 1'310'307 bytes.
Something can't be empty and have length at the same time?
I have tested returning a string that long and it works fine for me. Returning larger and larger strings works to a point memory limit is exceeded, which triggers explicit Fatal Error.
On a technical side strings can be much larger than million characters:
Note: As of PHP 7.0.0, there are no particular restrictions regarding the length of a string on 64-bit builds. On 32-bit builds and in earlier versions, a string can be as large as up to 2GB (2147483647 bytes maximum)
http://php.net/manual/en/language.types.string.php
I suspect that the failing operation has to do with how you use the result rather than return from the function.
Well, it took me two and a half year to figure this out ;(
Rarst was right: It was not a problem of the function compiling and delivering the string. Later in the process I was using preg_replace() to do some further replacements. This function is not able to handle long strings. This could be fixed by using ini_set('pcre.backtrack_limit', 99999999999) beforehand as mentioned in How to do preg_replace on a long string
Assume that I have a file named data.txt with the contents "Blah Blah !".
So when I use the code below
$hnd=fopen('data.txt','r');
echo fgets($hnd,2);
it displays just one character "B" instead of "Bl". Later I read the manual stating:
length
Reading ends when length - 1 bytes have been read, or a newline (which is included in the return value), or an EOF (whichever comes first). If no length is specified, it will keep reading from the stream until it reaches the end of the line.
Can anyone explain to me why it is this way? I mean why is it length-1 and not length.
The C fgets() function reads length - 1 bytes, because it has to add a terminating zero to turn the data into a proper string.
My best guess is that PHP's fgets() exhibits the same behaviour because it is either:
a legacy from the bad old days when PHP functions were little more that wrappers around the corresponding C functions, and string functions were binary unsafe (eg. strings could not contain embedded NUL characters). Changing the behaviour of the fgets() function would introduce new bugs in existing programs. Or,
a deliberate decision to make the PHP function compatible with the C function to avoid unnecessary surprises.
or both.
Interestingly, it looks like PHP internally adds a terminating zero when storing string values, for example in _php_stream_get_line() (called from fgets()) and zend_string_init().
Since _zend_string objects store the string length anyway, it shouldn't be necessary to store the terminating zero, unless there are still binary unsafe functions in PHP.
Because PHP, like many C-derivatives count from 0, and not from 1. They have Zero-based numbering
Eg for arrays: An array of length/size n has 0 to n - 1, elements.
i.e. 0, 1, 2 , 3, 4 .... n-1
So an array of length 5 has elements 0, 1, 2, 3, 4
So you will find that whether reading byte, strings, arrays... they always reference the to the (n-1)th element or marker, for an n-size structure
Please use following code for your raised questionarries
$hnd=fopen('E:\\data.txt','r');
echo fgets($hnd,2);
I'm looking for an efficient way to store a huge number of booleans (up to 2.5*10e11) in PHP's memory. My first idea was to create an array of integers and store one boolean per bit in each integer:
// number of booleans to store
$n = 2.5 * pow(10, 11);
// bits per integer
$bitsPerInt = PHP_INT_SIZE * 8;
// init storage
$storage = array();
for ($i=0; $i<ceil($n/$bitsPerInt); $i++) {
$storage[$i] = 0;
}
// bits in each integer can be accessed using PHP's bitwise operators
However, the overhead with this solution is still way too big: Storing 10^8 booleans (bits) in a 32-bit environment (PHP_INT_SIZE = 4 bytes) needs an array of 3125000 integers, consuming ~ 254 MB of memory, whereas the rare data of 10^8 booleans would only need ~ 12 MB.
So which is the best way to store a huge number of booleans in PHP (5)?
If you really must use an array of that many booleans you can use a string as a ByteArray and pack 8 booleans for each char in the string. This has very little memory overhead compared to the native PHP array but it is more difficult to use.
You can convert bytes to chars and back with the ord and chr functions.
Perhaps the SplStack or the SplFixedArray classes from the SPL fit your needs better.
If using one bit per value uses too much memory then you will need to rethink your design - everything in memory is just bits at the end of the day, and you can't squeeze more than one boolean value into a single bit (by definition).
I'm working with a large array which is a height map, 1024x1024 and of course, i'm stuck with the memory limit. In my test machine i can increase the mem limit to 1gb if i want, but in my tiny VPS with only 256 ram, it's not an option.
I've been searching in stack and google and found several "well, you are using PHP not because memory efficiency, ditch it and rewrite in c++" and honestly, that's ok and I recognize PHP loves memory.
But, when digging more inside PHP memory management, I did not find what memory consumes every data type. Or if casting to another type of data reduces mem consumption.
The only "optimization" technique i found was to unset variables and arrays, that's it.
Converting the code to c++ using some PHP parsers would solve the problem?
Thanks!
If you want a real indexed array, use SplFixedArray. It uses less memory. Also, PHP 5.3 has a much better garbage collector.
Other than that, well, PHP will use more memory than a more carefully written C/C++ equivalent.
Memory Usage for 1024x1024 integer array:
Standard array: 218,756,848
SplFixedArray: 92,914,208
as measured by memory_get_peak_usage()
$array = new SplFixedArray(1024 * 1024); // array();
for ($i = 0; $i < 1024 * 1024; ++$i)
$array[$i] = 0;
echo memory_get_peak_usage();
Note that the same array in C using 64-bit integers would be 8M.
As others have suggested, you could pack the data into a string. This is slower but much more memory efficient. If using 8 bit values it's super easy:
$x = str_repeat(chr(0), 1024*1024);
$x[$i] = chr($v & 0xff); // store value $v into $x[$i]
$v = ord($x[$i]); // get value $v from $x[$i]
Here the memory will only be about 1.5MB (that is, when considering the entire overhead of PHP with just this integer string array).
For the fun of it, I created a simple benchmark of creating 1024x1024 8-bit integers and then looping through them once. The packed versions all used ArrayAccess so that the user code looked the same.
mem write read
array 218M 0.589s 0.176s
packed array 32.7M 1.85s 1.13s
packed spl array 13.8M 1.91s 1.18s
packed string 1.72M 1.11s 1.08s
The packed arrays used native 64-bit integers (only packing 7 bytes to avoid dealing with signed data) and the packed string used ord and chr. Obviously implementation details and computer specs will affect things a bit, but I would expect you to get similar results.
So while the array was 6x faster it also used 125x the memory as the next best alternative: packed strings. Obviously the speed is irrelevant if you are running out of memory. (When I used packed strings directly without an ArrayAccess class they were only 3x slower than native arrays.)
In short, to summarize, I would use something other than pure PHP to process this data if speed is of any concern.
In addition to the accepted answer and suggestions in the comments, I'd like to suggest PHP Judy array implementation.
Quick tests showed interesting results. An array with 1 million entries using regular PHP array data structure takes ~200 MB. SplFixedArray uses around 90 megabytes. Judy uses 8 megs. Tradeoff is in performance, Judy takes about double the time of regular php array implementation.
A little bit late to the party, but if you have a multidimensional array you can save a lot of RAM when you store the complete array as json.
$array = [];
$data = [];
$data["a"] = "hello";
$data["b"] = "world";
To store this array just use:
$array[] = json_encode($data);
instead of
$array[] = $data;
If you want to get the arrry back, just use something like:
$myData = json_decode($array[0], true);
I had a big array with 275.000 sets and saved about 36% RAM consumption.
EDIT:
I found a more better way, when you zip the json string:
$array[] = gzencode(json_encode($data));
and unzip it when you need it:
$myData = json_decode(gzdecode($array[0], true));
This saved me nearly 75% of RAM peak usage.