PHP big array manipulation - php

I have a very big array stored in memory after reading a whole file in an array (as hex) like this :
$bin_content = fread(fopen($filename,"r"),filesize($filename));
$hex_decode = explode(" ",chunk_split(bin2hex($bin_content),2," "));
unset($bin_content);
function myfunction($i) {
global $hex_decode;
// stuff here
}
When I create a function that uses this $hex_decode array as global ... the script runs like forever (very slow) but if i call that function passing the $hex_decode as a parameter ( myfunction($i,$hex_decode) instead of myfunction($i) in which $i is a pointer) , things are more faster .
Can anyone explain why ? And is there any way to speed the script by reading that file in a different method .
I need to have the whole file in that array rather that line by line , because I'm building a custom ASN.1 decoder and i need to have it all .

And is there any way to speed the script by reading that file in a different method .
Personally I'd use a stream filter to chunk-read and convert the file as it was read rather than reading the entire file in one go, and then converting and fixing with the entire file in memory, handling any filtering and fixing of the ASN.1 structure within the stream filter.
I know this isn't a direct response to the actual question, but rather to the single quote above; but it could provide a less memory-hungry alternative.

There already was a similar question at StackOverflow, please take a look at:
The advantage / disadvantage between global variables and function parameters in PHP?

If your files can be huge you could consider a memory-conservative approach. ASN.1 is usually encoded in structures of type-length-value, which means that you don't have to store the whole thing in memory, just the data that you need to process at any given time.

Related

Call a method twice or create a temp variable?

I got to wondering about the efficiency of this:
I have a csv file with about 200 rows in it, I use a class to filter/break up the csv and get the bits I want. It is cached daily.
I found that many descriptions (can be up to ~500 chars each) have a hanging word "Apply" and it needs chopping off.
Thinking that calling toString() on my object more than once would be bad practice, I created a temp var : $UJM_desc (this code is inside a loop)
// mad hanging 'Apply' in `description` very often, cut it off
$UJM_desc = $description->toString();
$hanging = substr($UJM_desc, -5);
if($hanging == "Apply")
$UJM_desc = substr($UJM_desc, 0 , -5);
$html .= '<p>' . $UJM_desc ;
But could have just called $description->toString() a couple of times, I am aware there is room to simplify this maybe with a ternary, but still, I froze the moment and thought I'd ask.
Call a method twice or use a temp var? Which is best?
I'd just use regex to strip off the end:
$html .= '<p>' . preg_replace('/Apply$/', '', $description->toString());
That said, if $description->toString() gives the same output no matter where you use it, there's absolutely no reason to call it multiple times, and a temporary variable will be the most efficient.
There's also no reason to save $hanging to a variable, as you only use it once.
In general, it depends, and it's a tradeoff.
Keeping a calculated value in a variable takes up memory, and runs the risk of containing stale data.
Calculating the value anew might be slow, or expensive in some other way.
So it's a matter of deciding which resource is most important to you.
In this case, however, the temporary variable is so short-lived, it's definitely worth using.

PHP 101: variable vs function

I'm creating a global file to hold items that will be re-used throughout my website. What are the differences between these two lines of code? Is one "better" than the other?
This:
$logo = "img/mainlogo.jpg";
vs this:
function logo() {
echo "img/mainlogo.jpg";
}
You should code clear and readable and split in the html and php. The performance profit is not significant...
<?php
...
$logo = "img/mainlogo.jpg";
...
?>
...
<img src="<?= $logo ?>" alt="logo">
...
Of the two options you posted, the function is the better choice. But, to be brutally honest, this sort of thing is exactly what constants are for:
defined('MAIN_LOGO') || define('MAIN_LOGO','img/mainlogo.jpg');
suppose you're working on a site that has to support multiple languages, then you can simply use the same trick:
defined('CLIENT_LOCALE') || define('CLIENT_LOCATE',$whereverYouGetThisFrom);
defined('MAIN_LOGO') || define('MAIN_LOGO','img/mainlogo_'.CLIENT_LOCALE.'.jpg');
//if language is EN, mainlogo_EN.jpg will be used, if lang is ES, mainlogo_ES.jpg, etc...
Besides, a constant, once defined cannot be redefined (clue is in the name, of course). Also: since PHP still has a lot of C-stuff going under the bonnet, and you've tagged this question performance, it might interest you that constants are much like C's macro's, which are a lot faster than regular function calls, or even C++ inline functions (even if they were indeed compiled as inline functions).
Anyway, if you have a ton of these things you want to centralize, either think of creating a couple of ini files for your project, and parse them into some sort of global object
Functions are good.
I see that function logo() is better than $logo. echo doesn't take much memory, but $logo does. Even though, function logo() takes something, it will be handled by PHP's very own garbage collector. You can also use these functions to ensure that you are not misusing the memory allocated.
memory_get_peak_usage();
memory_get_usage();
Explanation:
Upon the ending of an in use function PHP clears the memory it was using, at least more efficiently than if not using a function. If you are using recursive code or something similar that is memory intensive try putting the code into a function or method, upon closing of the function/method the memory used for the function will be garbaged much more efficiently than that of unsetting variables within the loop itself.
Source: 7 tips to prevent PHP running out of memory
The main purpose of a function is to avoid code repetition and perform a specific task. Based on that definition, using a function to only return a value is a bad design.
In that context I think is better a good readability in the code than to save several bytes of memory. We are in 2012, optimization is good but this type of micro-optimization is simply ridiculous. I prefer assigning a variable, it's clear and do what you expect.
$logo = "img/mainlogo.jpg"; can be redefined naturally later without changing code by doing this $logo="img/newmainlogo.jpg"; whereas the function would have to be modified itself, in its first definition.

eval alternative to processing rules in PHP?

I know finding alternatives to eval() is a common topic, but I'm trying to do something I've not done before. I am processing known-format CSV files, and I have built rules for how the data will be handled depending on the source columns available.
Here is a simplified example of what I'm doing. In this example, $linedata represents a single row of data from the CSV file. $key["type"] points to the column I need the data from. If this column holds the value of "IN", I want $newcol set to "individual", else "organization".
$key["type"] = 12;
$linedata[12] = 'IN';
$rule = '($linedata[($key["type"])] == "IN" ? "individual" : "organization");';
eval ('$newcol = ' . $rule);
So $rule stores the logic. I can run a filter on the $linedata array to try and protect from malicious code coming from the CSV files, but I wonder if there is a better way to store and process rules like this?
You cannot store arbitrary PHP in a CSV file and then expect it to work without calling eval (or similar functionality).
The safe way to do what you're asking for is to treat the file as data, not code.
This is why languages like BBCode exist: you can't have an inert language trigger active features directly, so you create an easy-to-interpret mini-scripting-language that lets you achieve what you want.
In other words, you cannot store active "rules" in the file without interpreting them somehow, and you cannot simultaneously allow them to contain arbitrary PHP and be "safe". So you can either attempt to parse and restrict PHP (don't, it's tough!) or you can give them a nice easy little language, and interpret that. Or, better yet, don't store logic in data files.
I was wrong.
I can be wrong, however create_function may be good enough.
http://www.php.net/manual/en/function.create-function.php

Assigning contents of file to an object?

I currently have external JSON files that PHP uses to set up a request's validations and filters. This makes it easy to assign the contents of a file as an object to a variable.
$object = json_decode(file_get_contents($filepath));
I am wondering if there might be another way to do this that would not involve parsing JSON? I would like to still be able to assign the contents of a file to a variable runtime, but is there a better/faster way to do this?
Sounds complicated... I would suggest having a collection of php class files that are included on call using require_once();
It does depend on what is stored in these files though.
There is another method in php called parse_ini_file(); [docs] but its only good for key-value pairs.
Edit3: To assign something non-static to a variable from a file, you can make use of the return statement inside the file and then include the file to assign it to a variable:
$var = include($file);
$file must be a file with correct PHP syntax (see includeDocs and the return statement. Then you're ready to go.
Include therefore is pretty similar to eval on the language level, so take care.
There is another way, like JSON is serialization, there is serializeDocs and unserializeDocs as well in PHP.
$object = unserialize(file_get_contents($filepath));
file_put_contents($filepath, serialize($object));
It's said, that this is slower than with json_encode / json_decode (I have not measured it).
The benefit with it is, that there is better support for concrete PHP object instances, it just better integrates with the language (no wonder, as it's core PHP).
Edit: As commented, you want to manually edit the files. PHP serialized data (by default) is not very human read/edit friendly, so you might look for some serialization into XML: XML_Serializer PEAR Package. Another XML Serializer is available in Symfony2 and it supports other formats, too (like JSON).
The other alternative is to switch the PHP serializer to WDDX (ships with core PHP), which actually is XML.
Edit2: There is something similar to JSON for PHP: eval and var_export. But beware this is somewhat dirty (or better say evil) and the concrete class needs to support this (if using objects), too:
$var = array('prop' => 'some value');
ob_start();
var_export($var);
$buffer = ob_get_clean();
# $buffer now contains something that can be saved,
# to load it again:
$var = eval('return '.$buffer.';');
var_dump($var);
Sandbox

Force freeing memory in PHP

In a PHP program, I sequentially read a bunch of files (with file_get_contents), gzdecode them, json_decode the result, analyze the contents, throw most of it away, and store about 1% in an array.
Unfortunately, with each iteration (I traverse over an array containing the filenames), there seems to be some memory lost (according to memory_get_peak_usage, about 2-10 MB each time). I have double- and triple-checked my code; I am not storing unneeded data in the loop (and the needed data hardly exceeds about 10MB overall), but I am frequently rewriting (actually, strings in an array). Apparently, PHP does not free the memory correctly, thus using more and more RAM until it hits the limit.
Is there any way to do a forced garbage collection? Or, at least, to find out where the memory is used?
it has to do with memory fragmentation.
Consider two strings, concatenated to one string. Each original must remain until the output is created. The output is longer than either input.
Therefore, a new allocation must be made to store the result of such a concatenation. The original strings are freed but they are small blocks of memory.
In a case of 'str1' . 'str2' . 'str3' . 'str4' you have several temps being created at each . -- and none of them fit in the space thats been freed up. The strings are likely not laid out in contiguous memory (that is, each string is, but the various strings are not laid end to end) due to other uses of the memory. So freeing the string creates a problem because the space can't be reused effectively. So you grow with each tmp you create. And you don't re-use anything, ever.
Using the array based implode, you create only 1 output -- exactly the length you require. Performing only 1 additional allocation. So its much more memory efficient and it doesn't suffer from the concatenation fragmentation. Same is true of python. If you need to concatenate strings, more than 1 concatenation should always be array based:
''.join(['str1','str2','str3'])
in python
implode('', array('str1', 'str2', 'str3'))
in PHP
sprintf equivalents are also fine.
The memory reported by memory_get_peak_usage is basically always the "last" bit of memory in the virtual map it had to use. So since its always growing, it reports rapid growth. As each allocation falls "at the end" of the currently used memory block.
In PHP >= 5.3.0, you can call gc_collect_cycles() to force a GC pass.
Note: You need to have zend.enable_gc enabled in your php.ini enabled, or call gc_enable() to activate the circular reference collector.
Found the solution: it was a string concatenation. I was generating the input line by line by concatenating some variables (the output is a CSV file). However, PHP seems not to free the memory used for the old copy of the string, thus effectively clobbering RAM with unused data. Switching to an array-based approach (and imploding it with commas just before fputs-ing it to the outfile) circumvented this behavior.
For some reason - not obvious to me - PHP reported the increased memory usage during json_decode calls, which mislead me to the assumption that the json_decode function was the problem.
There's a way.
I had this problem one day. I was writing from a db query into csv files - always allocated one $row, then reassigned it in the next step. Kept running out of memory. Unsetting $row didn't help; putting an 5MB string into $row first (to avoid fragmentation) didn't help; creating an array of $row-s (loading many rows into it + unsetting the whole thing in every 5000th step) didn't help. But it was not the end, to quote a classic.
When I made a separate function that opened the file, transferred 100.000 lines (just enough not to eat up the whole memory) and closed the file, THEN I made subsequent calls to this function (appending to the existing file), I found that for every function exit, PHP removed the garbage. It was a local-variable-space thing.
TL;DR
When a function exits, it frees all local variables.
If you do the job in smaller portions, like 0 to 1000 in the first function call, then 1001 to 2000 and so on, then every time the function returns, your memory will be regained. Garbage collection is very likely to happen on return from a function. (If it's a relatively slow function eating a lot of memory, we can safely assume it always happens.)
Side note: for reference-passed variables it will obviously not work; a function can only free its inside variables that would be lost anyway on return.
I hope this saves your day as it saved mine!
I've found that PHP's internal memory manager is most-likely to be invoked upon completion of a function. Knowing that, I've refactored code in a loop like so:
while (condition) {
// do
// cool
// stuff
}
to
while (condition) {
do_cool_stuff();
}
function do_cool_stuff() {
// do
// cool
// stuff
}
EDIT
I ran this quick benchmark and did not see an increase in memory usage. This leads me to believe the leak is not in json_decode()
for($x=0;$x<10000000;$x++)
{
do_something_cool();
}
function do_something_cool() {
$json = '{"a":1,"b":2,"c":3,"d":4,"e":5}';
$result = json_decode($json);
echo memory_get_peak_usage() . PHP_EOL;
}
I was going to say that I wouldn't necessarily expect gc_collect_cycles() to solve the problem - since presumably the files are no longer mapped to zvars. But did you check that gc_enable was called before loading any files?
I've noticed that PHP seems to gobble up memory when doing includes - much more than is required for the source and the tokenized file - this may be a similar problem. I'm not saying that this is a bug though.
I believe one workaround would be not to use file_get_contents but rather fopen()....fgets()...fclose() rather than mapping the whole file into memory in one go. But you'd need to try it to confirm.
HTH
C.
Call memory_get_peak_usage() after each statement, and ensure you unset() everything you can. If you are iterating with foreach(), use a referenced variable to avoid making a copy of the original (foreach()).
foreach( $x as &$y)
If PHP is actually leaking memory a forced garbage collection won't make any difference.
There's a good article on PHP memory leaks and their detection at IBM
There recently was a similar issue with System_Daemon. Today I isolated my problem to file_get_contents.
Could you try using fread instead? I think this may solve your problem.
If it does, it's probably time to do a bugreport over at PHP.

Categories