Postgres - run a query in batches?

Postgres - run a query in batches? - php

Is it possible to loop through a query so that if (for example) 500,000 rows are found, it'll return results for the first 10,000 and then rerun the query again?
So, what I want to do is run a query and build an array, like this:
$result = pg_query("SELECT * FROM myTable");
$i = 0;
while($row = pg_fetch_array($result) ) {
$myArray[$i]['id'] = $row['id'];
$myArray[$i]['name'] = $row['name'];
$i++;
}
But, I know that there will be several hundred thousand rows, so I wanted to do it in batches of like 10,000... 1- 9,999 and then 10,000 - 10,999 etc... The reason why is because I keep getting this error:
Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 3 bytes)
Which, incidentally, I don't understand how 3 bytes could exhaust 512M... So, if that's something that I can just change, that'd be great, although, still might be better to do this in batches?

Those last 3 bytes were the straw that broke the camel's back. Probably an allocation attempt in a long string of allocations leading to the failure.
Unfortunately libpq will try to fully cache result sets in memory before relinquishing control to the application. This is in addition to whatever memory you are using up in $myArray.
It has been suggested to use LIMIT ... OFFSET ... to reduce the memory envelope; this will work, but is inefficient as it could needlessly duplicate server-side sorting effort every time the query is reissued with a different offset (e.g. in order to answer LIMIT 10 OFFSET 10000, Postgres will still have to sort the entire result set, only to return rows 10000..10010.)
Instead, use DECLARE ... CURSOR to create a server-side cursor, followed by FETCH FORWARD x to fetch the next x rows. Repeat as many times as needed or until less-than-x rows are returned. Do not forget to CLOSE the cursor when you are done, even when/if an exception is risen.
Also, do not SELECT *; if you only need id and name, create your cursor FOR SELECT id, name (otherwise libpq will needlessly retrieve and cache columns you never use, increasing memory footprint and overall query time.)
Using cursors as illustrated above, libpq will hold at most x rows in memory at any one time. However, make sure you also clean up your $myArray in between FETCHes if possible or else you could still run out of memory on account of $myArray.

You can use LIMIT (x) and OFFSET (y)

The PostgreSQL server caches query results until you actually retrieve them, so adding them to the array in a loop like that will cause an exhaustion of memory no matter what. Either process the results one row at a time, or check the length of the array, process the results pulled so far, and then purge the array.

What the error means is that PHP is trying to allocate 3 bytes, but all the available portion of that 512MB is less than 3 bytes.
Even if you do it in batches, depending on the size of the resulting array you could still exhaust the available memory.
Perhaps you don't really need to get all the records?

Related

mysqli_free_result() is slow when all the data are not read while using MYSQLI_USE_RESULT

I am using MYSQLI_USE_RESULT(Unbuffered query) while querying a huge data from the table.
For testing I took a table of 5.6 GB size.
I selected all columns Select * from test_table.
If I am not reading any rows with method like fetch_assoc() etc. Then try to close the result with mysqli_free_result(). It takes 5 to 10 secs to close it.
Sometimes I read required number rows based on available memory. And then I call mysqli_free_result() it takes less time when compared with not even one row is read.
So lesser unread rows means lesser time to free results. More unread rows more time to free results.
It's no where documented that this functionality will consume time in best of my knowledge.
Time taken for query is around 0.0008 sec.
Is it bug or is it expected behavior?
For me this sluggishness defeating whole point using MYSQLI_USE_RESULT.
MySQL v5.7.21, PHP v7.2.4 used for testing.
Alias of this function are mysqli_result::free -- mysqli_result::close -- mysqli_result::free_result -- mysqli_free_result.

PDO fetching large amount of data

I'm trying to fetch large amount of data from PDO PostgreSQL database to compute them through a simplification algorithm (reducing number of rows) in php.
The problem is I can't use the fetchAll, having often a message:
Allowed memory size of 134217728 bytes exhausted...
I then opted for a fetch using cursors which allows me to access wanted data whithout memory cares, like this example:
function simplify(&q){
//Doing some stuff loop
$current = $q->fetch(PDO::FETCH_ASSOC, PDO::FETCH_ORI_ABS, $index+1); //Iterators starts from 1
//Doing some more stuff
}
[...]
$query = $this->connexion->prepare($req, array(PDO::ATTR_CURSOR => PDO::CURSOR_SCROLL)); //More than 100k results
$result = simplify($query);
Now, this works like a charm... except for the execution time which takes up to 1min to compute 200k rows.
I specify my algorithm use the recursive behavior which not help in my case.
Is there another solution to reduce execution time whithout eating too much memory?
I thinked about trigger on the DB server side, but what do you think?

How to iterate through a "window" of data in a dataset?

I have a data set in mysql with 150 rows. I have a set of 2 for loops that run math calculations based on some user inputs and the dataset. The code does calculations for 30 row windows, and accumulates the results for each 30 row window in an array. What I mean is, I do a "cycle" of calculations on rows 0-29, then 1-30, then 2-31, etc... That would result in 120 "cycles".
Right now the for loop is set up like so (there are more fields, I just trimmed the code for simplicity of this question.
$period=30;
$query = "SELECT * FROM table";
$result = mysql_query($query);
while ($row = mysql_fetch_assoc($result)){
$data[] = array("Date" => $row['Date'], "ID" => $row['ID']);
}
for($i=0;$i<(count($data)-$window);$i++){
for($j=0;$j<$window;$j++){
//do calculations here with $data[]
$results[$i][$j]= calculations;
}
}
This works fine for the number of rows I have. However, I opened up the script to a larger dataset (1700 rows) with a different window (360 rows). This means there are exponentially more iterations. It gave me an out of memory error. Some quick use of memory_get_peak_usage() showed that memory would just continually increase.
I'm starting to think that having the loops search through that data array is extremely laborious, especially when the "window" overlaps on a lot of the "cycles". Example: Cycle 0 goes through rows 0-29. Cycle 1 goes through rows 1-30. So, both of those cycles share a row of data that they need, but I'm telling PHP to look for the new data each time.
Is there a way to structure this better? I'm getting kind of lost thinking about running these concurrent cycles.

I think the array that is blowing memory will be the $result array. In your small sample it will be a 2 dimensional array with 150x149 cells. array( 150, 149 ). At 144 bytes per element thats 3,218,400 bytes slightly over 3 Meg + remaining bucket space.
In you second larger sample it will be array(1700,1699). At 144 bytes per element thats 415,915,200 bytes, thats slightly over 406Meg + remaining bucket space, just to hold the results of your calculations.
I think you need to ask if you really need to hold all this data. If you really do, you may have to come up with another way of storing it.
I dont see any point attempting the 1000's odd database calls as this will only add to the overhead as you still have to maintain the hugh list of results in an array.

The SQL Way
You can accomplish this by using LIMIT
$period = 30;
$cycle = 0; //
$query = "SELECT * FROM table LIMIT $cycle,$period";
This will return only the results you need for each cycle. You will need to loop and increment $cycle. The way you are doing it now is probably better, however.
This won't loop back however and grab the first of the data, you will have to add additional logic to handle that case.

PHP Max amount of inserts in one SQL query

I have a pretty simple question. I am inserting a lot of records at once in a MySQL table. It works for about 2000 records (actually a bit more). But say I want to insert 3000 records, than it doesn't do anything.
I'm working through AS3 sending an array containing all the records via AMFPHP to a simple PHP script to parse and insert the array.
Is this normal, or should I look into it?
Currently I'm slicing my array in parts of 2000 records, and sending a couple AMFPHP requests instead of just 1.

PHP's queries are limited by the "max_allowed_packet" configuration option. It defines the absolute length limit, in characters, that a query string can be. Note that this isn't just the total size of the data being inserted, it's the entire query string. SQL commands, punctuation, spaces, etc...
Check how long your 3000 record version is vs. the 2000 one, and then get your server's packet length limit:
SHOW VARIABLES WHERE Variable_name LIKE '%max_allowed_packet%'
If your 3000-record version is longer than this limit, the query will defnitely fail because it'll be chopped off somewhere part-way

I don't think there is really a limit in the number of inserts in one query.
Instead, there is a limit in the size of the query you can send to MySQL
See :
max_allowed_packet
Packet too large
So, basically, this depends on the amount of data you have in each insert.

I would ensure max_allowed_packet is larger than your PHP SQL query.
http://dev.mysql.com/doc/refman/5.5/en/packet-too-large.html

I think PHP doesn't limit the amount of inserted query at one, instead its limit the amount of the memory usage that can be taken by script, and max time of the execution.

Difference in efficiency of retrieving all rows in one query, or each row individually?

I have a table in my database that has about 200 rows of data that I need to retrieve. How significant, if at all, is the difference in efficiency when retrieving all of them at once in one query, versus each row individually in separate queries?

The queries are usually made via a socket, so executing 200 queries instead of 1 represents a lot of overhead, plus the RDBMS is optimized to fetch a lot of rows for one query.
200 queries instead of 1 will make the RDBMS initialize datasets, parse the query, fetch one row, populate the datasets, and send the results 200 times instead of 1 time.
It's a lot better to execute only one query.

I think the difference will be significant, because there will (I guess) be a lot of overhead in parsing and executing the query, packaging the data up to send back etc., which you are then doing for every row rather than once.
It is often useful to write a quick test which times various approaches, then you have meaningful statistics you can compare.

If you were talking about some constant number of queries k versus a greater number of constant queries k+k1 you may find that more queries is better. I don't know for sure but SQL has all sorts of unusual quirks so it wouldn't surprise me if someone could come up with a scenario like this.
However if you're talking about some constant number of queries k versus some non-constant number of queries n you should always pick the constant number of queries option.

In general, you want to minimize the number of calls to the database. You can already assume that MySQL is optimized to retrieve rows, however you cannot be certain that your calls are optimized, if at all.

Extremely significant, Usually getting all the rows at once will take as much time as getting one row. So let's say that time is 1 second (very high but good for illustration) then getting all the rows will take 1 second, getting each row individually will take 200 seconds (1 second for each row) A very dramatic difference. And this isn't counting where are you getting the list of 200 to begin with.

All that said, you've only got 200 rows, so in practice it won't matter much.
But still, get them all at once.

Exactly as the others have said. Your RDBMS will not break a sweat throwing 200+++++ rows at you all at once. Getting all the rows in one associative array will also not make much difference to your script, since you no doubt already have a loop for grabbing each individual row.
All you need do is modify this loop to iterate through the array you are given [very minor tweak!]

The only time I have found it better to get fewer results from multiple queries instead of one big set is if there is lots of processing to be done on the results. I was able to cut out about 40,000 records from the result set (plus associated processing) by breaking the result set up. Anything you can build into the query that will allow the DB to do the processing and reduce result set size is a benefit, but if you truly need all the rows, just go get them.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Postgres - run a query in batches? - php

You can use LIMIT (x) and OFFSET (y)

Related

mysqli_free_result() is slow when all the data are not read while using MYSQLI_USE_RESULT

PDO fetching large amount of data

How to iterate through a "window" of data in a dataset?

PHP Max amount of inserts in one SQL query

Difference in efficiency of retrieving all rows in one query, or each row individually?

Categories

Resources