I have a very large array in PHP (5.6), generated dynamically, which I want to convert to JSON. The problem is that the array is too large that it doesn't fit in memory - I get a fatal error when I try to process it (exhausted memory). So I figured out that, using generators, the memory problem will disappear.
This is the code I've tried so far (this reduced example obvisously doesn't produce the memory error):
<?php
function arrayGenerator()// new way using generators
{
for ($i = 0; $i < 100; $i++) {
yield $i;
}
}
function getArray()// old way, generating and returning the full array
{
$array = [];
for ($i = 0; $i < 100; $i++) {
$array[] = $i;
}
return $array;
}
$object = [
'id' => 'foo',
'type' => 'blah',
'data' => getArray(),
'gen' => arrayGenerator(),
];
echo json_encode($object);
But PHP seems to not JSON-encode the values from the generator. This is the output I get from the previuos script:
{
"id": "foo",
"type": "blah",
"data": [// old way - OK
0,
1,
2,
3,
//...
],
"gen": {}// using generator - empty object!
}
Is it possible to JSON-encode an array produced by a generator without generating the full sequence before I call to json_encode?
Unfortunately, json_encode cannot generate a result from a generator function. Using iterator_to_array will still try to create the whole array, which will still cause memory issues.
You will need to create your function that will generate the json string from the generator function. Here's an example of how that could look:
function json_encode_generator(callable $generator) {
$result = '[';
foreach ($generator as $value) {
$result .= json_encode($value) . ',';
}
return trim($result, ',') . ']';
}
Instead of encoding the whole array at once, it encodes only one object at a time and concatenates the results into one string.
The above example only takes care of encoding an array, but it can be easily extended to recursively encoding whole objects.
If the created string is still too big to fit in the memory, then your only remaining option is to directly use an output stream. Here's how that could look:
function json_encode_generator(callable $generator, $outputStream) {
fwrite($outputStream, '[');
foreach ($generator as $key => $value) {
if ($key != 0) {
fwrite($outputStream, ',');
}
fwrite($outputStream, json_encode($value));
}
fwrite($outputStream, ']');
}
As you can see, the only difference is that we now use fwrite to write to the passed in stream instead of concatenating strings, and we also need to take care of the trailing comma in a different way.
What is a generator function?
A generator function is effectively a more compact and efficient way to write an Iterator. It allows you to define a function that will calculate and return values while you are looping over it:
Also as per document from http://php.net/manual/en/language.generators.overview.php
Generators provide an easy way to implement simple iterators without the overhead or complexity of implementing a class that implements the Iterator interface.
A generator allows you to write code that uses foreach to iterate over a set of data without needing to build an array in memory, which may cause you to exceed a memory limit, or require a considerable amount of processing time to generate. Instead, you can write a generator function, which is the same as a normal function, except that instead of returning once, a generator can yield as many times as it needs to in order to provide the values to be iterated over.
What is yield?
The yield keyword returns data from a generator function:
The heart of a generator function is the yield keyword. In its simplest form, a yield statement looks much like a return statement, except that instead of stopping execution of the function and returning, yield instead provides a value to the code looping over the generator and pauses execution of the generator function.
So in your case to generate expected output you need to iterate output of arrayGenerator() function by using foreach loop or iterator before processind it to json (as suggested by #apokryfos)
Related
I'm pretty new and need guidance on how to approach a simple problem.
I'm building an app in PHP to allow different views of Fantasy Premier League (FPL) team data. I'm pulling data from an API (JSON array), looping through it, and storing it in a db. One of the API calls needs a team_id to pull that team's respective players.
My initial thinking is to write a function that takes the team_id as an argument, then parses the data to insert into the db. But how would I pass each of 12 different team_ids to the function to be processed without repeating the function 12 times for each team_id?
So, I might have:
$team_id_1 = 1;
$team_id_2 = 2;
$team_id_3 = 3;
etc...
Then my function might be:
function insert_team_data($team_id) {
$fplUrl = 'https://fantasy.premierleague.com/api/entry/' . $team_id . '/event/29/picks/';
foreach ($team_data as $key => $value) {
# code...
}
$sql = "INSERT INTO ..."
}
Is it possible to construct this in a way where each team_id gets passed to the function iteratively in a single process (i.e. without repeating the function code for each team_id)? Would I create an array of the team_ids and have the function loop through each one, then loop through each resulting team's data?
Yes, you can do either: use an array or a variadic function.
The way you're thinking of doing it is what's called a variadic function or variable-length function.
It can be achieved through either the use of func_get_args() or the splat operator, which handles argument packing/unpacking.
Here's an example.
function insert_team_data(... $team_ids) {
// $team_ids will appear as an array in your function
foreach ($team_ids as $team_id) {
$fplUrl = "https://fantasy.premierleague.com/api/entry/$team_id/event/29/picks/";
$sql = "INSERT INTO ..."
// rest of code...
}
}
You can now call the function like this: insert_team_data(1, 2, 3, 4, 5)
Any idea on how to restructure the json below:
$jsonArray = [{"Level":"77.2023%","Product":"Milk","Temperature":"4"},
{"Level":"399.2023%","Product":"Coffee","Temperature":"34"},
{"Level":"109.2023%","Product":"Chocolate","Temperature":"14"}]
Expected outcome:
$expected = {"Milk":{"Level":"77.2023%","Temperature":"4"},
"Coffee":{"Level":"399.2023%","Temperature":"34"},
"Chocolate":{"Level":"109.2023%","Temperature":"14"}
}
I'm new and my thinking is get the product value in array and again use foreach loop to find the others value? .
Here's one possibility:
$jsonArray = '[{"Level":"77.2023%","Product":"Milk","Temperature":"4"},
{"Level":"399.2023%","Product":"Coffee","Temperature":"34"},
{"Level":"109.2023%","Product":"Chocolate","Temperature":"14"}]';
$output = array();
foreach (json_decode($jsonArray, true) as $row) {
$product = $row['Product'];
$output[$product] = $row;
unset($output[$product]['Product']);
}
echo json_encode($output);
Output:
{"Milk":{"Level":"77.2023%","Temperature":"4"},
"Coffee":{"Level":"399.2023%","Temperature":"34"},
"Chocolate":{"Level":"109.2023%","Temperature":"14"}
}
Demo on 3v4l.org
This made some trick
$a = '[{"Level":"77.2023%","Product":"Milk","Temperature":"4"},
{"Level":"399.2023%","Product":"Coffee","Temperature":"34"},
{"Level":"109.2023%","Product":"Chocolate","Temperature":"14"}]';
$newAr = array();
foreach(json_decode($a,true) as $key=>$value)
{
$newAr[$value['Product']] = array(
'Level' => $value['Level'],
'Temperature' => $value['Temperature'],
);
}
There are many ways to perform this with Loops in PHP. Other answers demonstrate it accurately. I would also suggest to integrate some form of Error handling, data validation/filtering/restriction in your code to avoid unexpected results down the road.
For instance, json_decode(), either assigned to a variable pre-foreach or straight in the foreach() 1st argument will just raise a warning if the original json is not valid-json, and just skip over the foreach used to construct your final goal. Then if you pass the result (that may have failed) directly to your next logic construct, it could create some iffy-behavior.
Also, on the concept of data-validation & filtering, you could restrict the foreach(), or any other looping mechanism, to check against a Product_List_Array[Milk,Coffee,Chocolate], using if(in_array() ...) so the final/goal object only contains the expected Products, this, in the case the original json has other artifacts. Filtering the values can also increase stability in restricting, for example, Temperature to Float.
I have an array (Well, a PHP array... which is not really an array. But you get the point.) of objects representing SMSs. One of the fields in these objects is of type DateTime, and I want to sort the array by that field. I can not sort the data in DB, I'm receiving it from a web service that I can not change, so please don't suggest that. I sort the array with the following snippet of code:
usort($smsMessages, function ($a, $b) {
if ($a->SendTime == $b->SendTime) {
return 0;
}
return ($a->SendTime < $b->SendTime) ? -1 : 1;
});
This works, but it takes 160 seconds to sort 30.000 elements.
Now, I know that php is slow, but this is ridiculous. Is there something wrong with the way I wrote this? Is usort known to be slow/broken/buggy? Should I use another method? Roll my own?
I had the same issue. We needed to sort 2-10 million arrays. Each array contained about 30 fields (strings, integers and NULLs). The first field was a unique integer that we used for sorting.
We used PHP 7.1
It took 4710 seconds (= 78.5 minutes) to sort 2,028,830 items on AWS EC2 r4.large.
Our code looked like:
usort($this->rows, function ($item1, $item2) {
return $item1[0] <=> $item2[0];
});
Then I figured out that replacing $this->rows with $rows makes it almost 4 times faster:
usort($rows, function ($item1, $item2) {
return $item1[0] <=> $item2[0];
});
It decreased execution time from 4710 to 1195 seconds.
Another approach is to use Min Heap for $this->rows instead of plain PHP array []. It leads to about the same performance improvement. In this case you don't need usort at all.
Bottom line:
1. but yes, it does take surprisingly huge amount of time even after changes above.
2. usort is way quicker than MinHeap for already sorted arrays.
You could try speeding up the above code to see if the sorting is in fact the bottle neck, you could try speeding it up by using a global function with it and shortening the code ( Disclaimer: massive microoptimization, could be this is not where your issue is at!) like so:
function sort_function($a, $b){
$a = $a->SendTime;
$b = $b->SendTime;
if ($a == $b) return 0;
return ($a < $b) ? -1 : 1;
}
usort($smsMessages,'sort_function');
Assuming most SendTimes are not equal this should in fact speed things up.
But please understand that the above is a very slight speedup only. If you actually see things got to like 140s => you can blame usort here. In all likelihood though, the above suggestion's value for you lies in learning that the usort part of things is not your issue in my opinion.
Added after more input below:
After likely having learnt that this is all about a lack of memory (the numbers for usage you posted are about the system overall, I cannot deduce how much of those 256MB are actually in use without more knowledge of these objects :) ), how does this code compare in runtime for you ?
$dates = array();
foreach ($smsMessages as $key => $obj) {
$dates[$key] = $obj->SendTime;
}
asort($dates);
$dates = array_keys($dates);
$sorted = array();
foreach ($dates as $key) {
$sorted[] = &$smsMessages[$key];
}
This should need significantly less memory since it doesn't use an implicit foreach loop on the huge array but just on the array keys.
Try this:
First, add "true" to the json_decode as a second parameter, so you will get an associative array instead of array of objects.
(I would also recommend to try this to speed up JSON: https://github.com/RustJason/php-rapidjson - it requires PHP7 though)
And then:
$sentTime = [];
foreach ($smsMessages as $key => $element) {
$sentTime[$key] = strtotime($element['sent']);
}
array_multisort($sentTime, SORT_DESC, $smsMessages);
(0.19 secs on my comp.)
You can convert some $smsMessages into objects later at the moment when you really need them with (object)$smsMessage or using your own/customized method.
I have several thousand records (stored in in a table in a MYSQL table) that I need to "batch process." All of the records contain a large JSON. In some cases, the JSON is over 1MB (yes, my DB is well over 1GB).
I have a function that grabs a record, decodes the JSON, changes some data, re-encodes the PHP array back to a JSON, and saves it back to the db. Pretty simple. FWIW, this is within the context of a CakePHP app.
Given an array of ID's, I'm attempting to do something like this (very simple mock code):
foreach ($ids as $id) {
$this->Model->id = $id;
$data = $this->Model->read();
$newData = processData($data);
$this->Model->save($newData);
}
The issue is that, very quickly, PHP runs out of memory. When running a foreach like this, it's almost as if PHP moves from one record to the next, without releasing the memory required for the preceding operations.
Is there anyway to run a loop in such a way that memory is freed before moving on to the next iteration of the loop, so that I can actually process the massive amount of data?
Edit: Adding more code. This function takes my JSON, converts it to a PHP array, does some manipulation (namely, reconfiguring data based on what's present in another array), and replacing values in the the original array. The JSON is many layers deep, hence the extremely long foreach loops.
function processData($theData) {
$toConvert = json_decode($theData['Program']['data'], $assoc = true);
foreach($toConvert['cycles'] as $cycle => $val) {
foreach($toConvert['cycles'][$cycle]['days'] as $day => $val) {
foreach($toConvert['cycles'][$cycle]['days'][$day]['sections'] as $section => $val) {
foreach($toConvert['cycles'][$cycle]['days'][$day]['sections'] as $section => $val) {
foreach($toConvert['cycles'][$cycle]['days'][$day]['sections'][$section]['exercises'] as $exercise => $val) {
if (isset($toConvert['cycles'][$cycle]['days'][$day]['sections'][$section]['exercises'][$exercise]['selectedFolder'])) {
$folderName = $toConvert['cycles'][$cycle]['days'][$day]['sections'][$section]['exercises'][$exercise]['selectedFolder']['folderName'];
if ( isset($newFolderList['Folders'][$folderName]) ) {
$toConvert['cycles'][$cycle]['days'][$day]['sections'][$section]['exercises'][$exercise]['selectedFolder'] = $newFolderList['Folders'][$folderName]['id'];
}
}
if (isset($toConvert['cycles'][$cycle]['days'][$day]['sections'][$section]['exercises'][$exercise]['selectedFile'])) {
$fileName = basename($toConvert['cycles'][$cycle]['days'][$day]['sections'][$section]['exercises'][$exercise]['selectedFile']['fileURL']);
if ( isset($newFolderList['Exercises'][$fileName]) ) {
$toConvert['cycles'][$cycle]['days'][$day]['sections'][$section]['exercises'][$exercise]['selectedFile'] = $newFolderList['Exercises'][$fileName]['id'];
}
}
}
}
}
}
}
return $toConvert;
}
Model->read() essentially just tells Cake to pull a record from the db, and returns it in an array. There's plenty of stuff that's happening behind the scenes, someone more knowledgable would have to explain that.
The first step I would do is make sure everything is passed by reference.
Eg,
foreach ($ids as $id) {
processData($data);
}
function processData(&$d){}
http://php.net/manual/en/language.references.pass.php
In the PHP manual, (array_push) says..
If you use array_push() to add one element to the array it's better to
use $array[] = because in that way there is no overhead of calling a
function.
For example :
$arr = array();
array_push($arr, "stackoverflow");
print_r($arr);
vs
$arr[] = "stackoverflow";
print_r($arr);
I don't understand why there is a big difference.
When you call a function in PHP (such as array_push()), there are overheads to the call, as PHP has to look up the function reference, find its position in memory and execute whatever code it defines.
Using $arr[] = 'some value'; does not require a function call, and implements the addition straight into the data structure. Thus, when adding a lot of data it is a lot quicker and resource-efficient to use $arr[].
You can add more than 1 element in one shot to array using array_push,
e.g. array_push($array_name, $element1, $element2,...)
Where $element1, $element2,... are elements to be added to array.
But if you want to add only one element at one time, then other method (i.e. using $array_name[]) should be preferred.
The difference is in the line below to "because in that way there is no overhead of calling a function."
array_push() will raise a warning if the first argument is not
an array. This differs from the $var[] behaviour where a new array is
created.
You should always use $array[] if possible because as the box states there is no overhead for the function call. Thus it is a bit faster than the function call.
array_push — Push one or more elements onto the end of array
Take note of the words "one or more elements onto the end"
to do that using $arr[] you would have to get the max size of the array
explain:
1.the first one declare the variable in array.
2.the second array_push method is used to push the string in the array variable.
3.finally it will print the result.
4.the second method is directly store the string in the array.
5.the data is printed in the array values in using print_r method.
this two are same
both are the same, but array_push makes a loop in it's parameter which is an array and perform $array[]=$element
Thought I'd add to the discussion since I believe there exists a crucial difference between the two when working with indexed arrays that people should be aware of.
Say you are dynamically creating a multi-dimensional associative array by looping through some data sets.
$foo = []
foreach ($fooData as $fooKey => $fooValue) {
foreach ($fooValue ?? [] as $barKey => $barValue) {
// Approach 1: results in Error 500
array_push($foo[$fooKey], $barKey); // Error 500: Argument #1 ($array) must be of type array
// NOTE: ($foo[$fooKey] ?? []) argument won't work since only variables can be passed by reference
// Approach 2: fix problem by instantiating array beforehand if it didn't exist
$foo[$fooKey] ??= [];
array_push($foo[$fooKey], $barKey);
// Approach 3: One liner approach
$foo[$fooKey][] = $barKey; // Instantiates array if it doesn't exist
}
}
Without having $foo[$fooKey] instantiated as an array beforehand, we won't be able to do array_push without getting the Error 500. The shorthand $foo[$fooKey][] does the heavy work for us, checking if the provided element is an array, and if it isn't, it creates it and pushes the item in for us.
I know this is an old answer but it might be helpful for others to know that another difference between the two is that if you have to add more than 2/3 values per loop to an array it's faster to use:
for($i = 0; $i < 10; $i++){
array_push($arr, $i, $i*2, $i*3, $i*4, ...)
}
instead of:
for($i = 0; $i < 10; $i++){
$arr[] = $i;
$arr[] = $i*2;
$arr[] = $i*3;
$arr[] = $i*4;
...
}
edit- Forgot to close the bracket for the for conditional
No one said, but array_push only pushes a element to the END OF THE ARRAY, where $array[index] can insert a value at any given index. Big difference.