I'm trying to fetch large amount of data from PDO PostgreSQL database to compute them through a simplification algorithm (reducing number of rows) in php.
The problem is I can't use the fetchAll, having often a message:
Allowed memory size of 134217728 bytes exhausted...
I then opted for a fetch using cursors which allows me to access wanted data whithout memory cares, like this example:
function simplify(&q){
//Doing some stuff loop
$current = $q->fetch(PDO::FETCH_ASSOC, PDO::FETCH_ORI_ABS, $index+1); //Iterators starts from 1
//Doing some more stuff
}
[...]
$query = $this->connexion->prepare($req, array(PDO::ATTR_CURSOR => PDO::CURSOR_SCROLL)); //More than 100k results
$result = simplify($query);
Now, this works like a charm... except for the execution time which takes up to 1min to compute 200k rows.
I specify my algorithm use the recursive behavior which not help in my case.
Is there another solution to reduce execution time whithout eating too much memory?
I thinked about trigger on the DB server side, but what do you think?
Related
I use Laravel 8 to perform a query on a MySQL 8 table using the query builder directly to avoid Eloquent overhead but I'm getting a lot of memory consumption anyway.
To show you an example, I perform the following query to select exactly 300 000 elements.
My code looks like this:
$before = memory_get_usage();
$q_coords = DB::table('coords')->selectRaw('alt, lat, lng, id')
->where('active', 1)->take(300000)->get();
$after = memory_get_usage();
echo ($after - $before);
It displays 169760384 which means something like 169MB if I'm not mistaking..
Looks like a lot to me because in my query I only asked for 2 float and 2 bigInt, which represents something like 4 x 8 bytes (32 bytes).
And.. 32 x 300 000 records ~= 9600000 (almost 10MB).
How is that even possible that it uses so much memory? I am very surprised.
EDIT
I also tried using PDO directly, same result.
$query = DB::connection()->getPdo()->query("select alt, lat, lng, id from coords WHERE active = 1 LIMIT 300000");
$q_coords = $query->fetchAll();
Because thet are represented as PHP objects in memory and not just as their raw data usage.
However there is a solution to limit the memory usage: chunk
https://blackdeerdev.com/laravel-chunk-vs-cursor/
Chunk: It will “paginate” your query, this way you use less memory.
In PHP each variable is handled with a specific data structure to allow dynamic typing, garbage collection and more..
You can see here a (pretty old but still ok) article: link
You can also see that arrays have a more specific processing, because it need a bucket, for example to store array keys which are considered as Strings.
All of that means there is (according to the article) approximately 144 bytes of data used to store an element of an array.
Well, while I can't explain EXACTLY your result, I can still tell you that in your case have something like this:
300 000 * 144 * 4 = 172 800 000
Which means 300000 rows of 4 variables with 144 bytes by variable.
As you can see it's not that far away from what you got even if my maths are not taking into account the improvement done in PHP 7 and other factors...
Since Laravel Query Builder uses stdObj to represent it results, you will have a lot of overhead:
Each object will store the value of the row itself, and the names of each column. So your 32 bytes turns into a lot of bytes.
Have a table in my database with around 9,000,000 data.
My problem occurs at the time of
and consult that table through Laravel, since when doing only the query to extract all the data with Number::all(); Well, my server obviously collapses because of its capacity, but my detail is that I need to extract around 50,000 data randomly from that table and I don't know how to do it. Since the Random method of the Collection does not execute it since the server falling does not generate anything for me.
What could you propose to be able to generate my query? How could I do it? It's a matter of my company and I really don't know what else to do.
This is the error that always generates me when consulting:
PHP Fatal error: Allowed memory size of 2097152 bytes exhausted
When testing with chunk it generates values in less than 2 seconds, the detail is that it brings me the values starting with the first index of the table when random values are needed.
I did the test with this code
$numeros = Number::select('NUMBERS_ID', 'NUMBER')
->inRandomOrder()
->take(1000)
->get()
->chunk(100);
But it still generates an amount of time of about 2 minutes
I need to generate random data in a considered amount of time of about 30 seconds, but I am receiving random responses with an approximate 2 minutes.
I have a data set in mysql with 150 rows. I have a set of 2 for loops that run math calculations based on some user inputs and the dataset. The code does calculations for 30 row windows, and accumulates the results for each 30 row window in an array. What I mean is, I do a "cycle" of calculations on rows 0-29, then 1-30, then 2-31, etc... That would result in 120 "cycles".
Right now the for loop is set up like so (there are more fields, I just trimmed the code for simplicity of this question.
$period=30;
$query = "SELECT * FROM table";
$result = mysql_query($query);
while ($row = mysql_fetch_assoc($result)){
$data[] = array("Date" => $row['Date'], "ID" => $row['ID']);
}
for($i=0;$i<(count($data)-$window);$i++){
for($j=0;$j<$window;$j++){
//do calculations here with $data[]
$results[$i][$j]= calculations;
}
}
This works fine for the number of rows I have. However, I opened up the script to a larger dataset (1700 rows) with a different window (360 rows). This means there are exponentially more iterations. It gave me an out of memory error. Some quick use of memory_get_peak_usage() showed that memory would just continually increase.
I'm starting to think that having the loops search through that data array is extremely laborious, especially when the "window" overlaps on a lot of the "cycles". Example: Cycle 0 goes through rows 0-29. Cycle 1 goes through rows 1-30. So, both of those cycles share a row of data that they need, but I'm telling PHP to look for the new data each time.
Is there a way to structure this better? I'm getting kind of lost thinking about running these concurrent cycles.
I think the array that is blowing memory will be the $result array. In your small sample it will be a 2 dimensional array with 150x149 cells. array( 150, 149 ). At 144 bytes per element thats 3,218,400 bytes slightly over 3 Meg + remaining bucket space.
In you second larger sample it will be array(1700,1699). At 144 bytes per element thats 415,915,200 bytes, thats slightly over 406Meg + remaining bucket space, just to hold the results of your calculations.
I think you need to ask if you really need to hold all this data. If you really do, you may have to come up with another way of storing it.
I dont see any point attempting the 1000's odd database calls as this will only add to the overhead as you still have to maintain the hugh list of results in an array.
The SQL Way
You can accomplish this by using LIMIT
$period = 30;
$cycle = 0; //
$query = "SELECT * FROM table LIMIT $cycle,$period";
This will return only the results you need for each cycle. You will need to loop and increment $cycle. The way you are doing it now is probably better, however.
This won't loop back however and grab the first of the data, you will have to add additional logic to handle that case.
I have a pretty simple question. I am inserting a lot of records at once in a MySQL table. It works for about 2000 records (actually a bit more). But say I want to insert 3000 records, than it doesn't do anything.
I'm working through AS3 sending an array containing all the records via AMFPHP to a simple PHP script to parse and insert the array.
Is this normal, or should I look into it?
Currently I'm slicing my array in parts of 2000 records, and sending a couple AMFPHP requests instead of just 1.
PHP's queries are limited by the "max_allowed_packet" configuration option. It defines the absolute length limit, in characters, that a query string can be. Note that this isn't just the total size of the data being inserted, it's the entire query string. SQL commands, punctuation, spaces, etc...
Check how long your 3000 record version is vs. the 2000 one, and then get your server's packet length limit:
SHOW VARIABLES WHERE Variable_name LIKE '%max_allowed_packet%'
If your 3000-record version is longer than this limit, the query will defnitely fail because it'll be chopped off somewhere part-way
I don't think there is really a limit in the number of inserts in one query.
Instead, there is a limit in the size of the query you can send to MySQL
See :
max_allowed_packet
Packet too large
So, basically, this depends on the amount of data you have in each insert.
I would ensure max_allowed_packet is larger than your PHP SQL query.
http://dev.mysql.com/doc/refman/5.5/en/packet-too-large.html
I think PHP doesn't limit the amount of inserted query at one, instead its limit the amount of the memory usage that can be taken by script, and max time of the execution.
Is it possible to loop through a query so that if (for example) 500,000 rows are found, it'll return results for the first 10,000 and then rerun the query again?
So, what I want to do is run a query and build an array, like this:
$result = pg_query("SELECT * FROM myTable");
$i = 0;
while($row = pg_fetch_array($result) ) {
$myArray[$i]['id'] = $row['id'];
$myArray[$i]['name'] = $row['name'];
$i++;
}
But, I know that there will be several hundred thousand rows, so I wanted to do it in batches of like 10,000... 1- 9,999 and then 10,000 - 10,999 etc... The reason why is because I keep getting this error:
Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 3 bytes)
Which, incidentally, I don't understand how 3 bytes could exhaust 512M... So, if that's something that I can just change, that'd be great, although, still might be better to do this in batches?
Those last 3 bytes were the straw that broke the camel's back. Probably an allocation attempt in a long string of allocations leading to the failure.
Unfortunately libpq will try to fully cache result sets in memory before relinquishing control to the application. This is in addition to whatever memory you are using up in $myArray.
It has been suggested to use LIMIT ... OFFSET ... to reduce the memory envelope; this will work, but is inefficient as it could needlessly duplicate server-side sorting effort every time the query is reissued with a different offset (e.g. in order to answer LIMIT 10 OFFSET 10000, Postgres will still have to sort the entire result set, only to return rows 10000..10010.)
Instead, use DECLARE ... CURSOR to create a server-side cursor, followed by FETCH FORWARD x to fetch the next x rows. Repeat as many times as needed or until less-than-x rows are returned. Do not forget to CLOSE the cursor when you are done, even when/if an exception is risen.
Also, do not SELECT *; if you only need id and name, create your cursor FOR SELECT id, name (otherwise libpq will needlessly retrieve and cache columns you never use, increasing memory footprint and overall query time.)
Using cursors as illustrated above, libpq will hold at most x rows in memory at any one time. However, make sure you also clean up your $myArray in between FETCHes if possible or else you could still run out of memory on account of $myArray.
You can use LIMIT (x) and OFFSET (y)
The PostgreSQL server caches query results until you actually retrieve them, so adding them to the array in a loop like that will cause an exhaustion of memory no matter what. Either process the results one row at a time, or check the length of the array, process the results pulled so far, and then purge the array.
What the error means is that PHP is trying to allocate 3 bytes, but all the available portion of that 512MB is less than 3 bytes.
Even if you do it in batches, depending on the size of the resulting array you could still exhaust the available memory.
Perhaps you don't really need to get all the records?