Im am running a query and retrieving with OCI_FETCH_ARRAY and I am getting a fatal error, out of memory after I hit a certain volume of records. The result array is 100k rows and about 60 columns.
I have my memory_limit in php.ini set to 2 gigs.
memory_limit = 2056M
It seems to happen when I have more than one person running the script at the same time (or same person running twice as it is set up to run in the background).
It only takes 2 concurrent jobs of 100k records to cause the error.
Everything I've found on OCI_FETCH_ARRAY states that it isn't caching the whole result set in to memory but it looks like it IS.
This is my code (Very straight forward)
while ($row = oci_fetch_array($stid, OCI_ASSOC+OCI_RETURN_NULLS)) {
array_push($resultfile,$row);
$tablerow=$tablerow +1;
unset($row);
}
The error happens on the OCI_FETCH_ARRAY statement after it hits a certain number of loops.
The output file is only 94m (avg) so doesn't seem like I should be anywhere memory limit.
below code is causing high memory usage :
array_push($resultfile,$row);
oci_fetch_array is unbuffered meaning it will fetch rows one by one until no rows exists. I would suggest not to push row into another array. instead write your logic inside while loop itself.
Related
So I have a custom artisan command that I wrote to slug a column and store it into a new column. I have a progress bar implemented, and for some reason, when the command reaches 50% completion it jumps to 100%. The issue is that it has only executed the code on half of the data.
I am using the chunk() function to break the data into chunks of 1,000 rows to eliminate memory exhaustion issues. This is necessary because my dataset is extremely large.
I have looked into my PHP error logs, MySQL error logs, and Laravel logs. I can't find any error or log line pertaining to this command. Any ideas on a new place to even start looking for the issue.
$jobTitles = ModelName::where($columnName, '<>', '')
->whereNull($slugColumnName)
->orderBy($columnName)
->chunk(1000, function($jobTitles) use($jobCount, $bar, $columnName, $slugColumnName)
{
foreach($jobTitles as $jobTitle)
{
$jobTitle->$slugColumnName = Str::slug($jobTitle->$columnName);
$jobTitle->save();
}
$bar->advance(1000);
});
$bar->finish();
What's happening is the whereNull($slugColumnName) in combination with the callback setting the $slugColumnName is leading to missed results on subsequent loops.
The order of events is something like this:
Get first set of rows: select * from table where column is null limit 100;
For each of the rows, set column to a value.
Get next set of rows: select * from table where column is null limit 100 offset 100;.
Continue and increase the offset until no more results.
The problem here is that after the second step you have removed 100 results from the total. Say you begin with 1000 total rows, by the second query you now only have 900 matching rows.
This causes the offset to be seemingly skipping an entire chunk by starting at row 100, when the first 100 rows have not been touched yet.
For more official documentation, please see this section of this section on chunking.
I have not tested this to verify it works as expected for your use-case, but it appears that using chunkById will account for this issue and correct your results.
i was playing with PDO on PostgreSQL 9.2.4 and was trying to fetch data from a table having millions on rows. My query returns about 100.000 rows.
I do not use any of PDOStatements's fetch function, i simply use the result from the PDO Objecte itels and loop through it.
But its getting slower and slower by time. At the beginning it was fetching like 200 rows per second. But the close it comes to its end, it gets slower. Now being at row 30.000 it fetches only 1 row per second. Why is it getting slower.
I do this, its pretty simple:
$dbh = new PDO("pgsql...");
$sql = "SELECT x, y FROM point WHERE name is NOT NULL and place IN ('area1', 'area2')";
$res = $dbh->query($sql);
$ins_sql = "INSERT INTO mypoints (x, y) VALUES ";
$ins_vals = [];
$ins_placeholders = [];
foreach($res as $row) {
$ins_placeholders[] = "(?,?)";
$ins_vals = array_merge($ins_vals, [$row['x'], $row['y']]);
printCounter();
}
// now build up one insert query using placeholders and values,
// to insert all of them in one shot into table mypoints
Function printCounter simply increases an int var and prints it. So i can see how many rows it has put already in that array before i create my insert statement out of it. I use one shot inserts to speed things up, better than doing 100.000 inserts.
But that foreach loop is getting slower by time. How can i increase the speed.
Is there a difference between fetch() and the simple loop method using the pdostatement in foreach?
when i start this php script, it takes like 5-10 seconds for the query. So this has nothing to do with how the table is setup and if i need indexes.
I have other tables returning 1 million rows, im not sure what is the best way to fetch them. I can raise PHP's memory_limit if needed, so the most important thing for me is SPEED.
Appreciate any help.
It's not likely that the slowness is related to the database, because after the $dbh->query() call, the query is finished and the resulting rows are all in memory (they are not in PHP variables yet, but they're in memory accessible at the pgsql module level).
The more likely culprit is the array_merge operation. The array becomes larger at every loop iteration, and the operation recreates the entire array each time.
You may want to do instead:
$ins_vals[] = [$row['x'], $row['y']];
Although personally, when concerned with speed, I'd use an even simpler flat structure:
$ins_vals[] = $x;
$ins_vals[] = $y;
Another unrelated point is that it seems to build a query with a huge number of placeholders, which is not how placeholders are normally used. To send large numbers of values to the server, the efficient way is to use COPY, possibly into a temporary table followed by server-side merge operations if it's not a plain insertion.
I dont know why, but using fetch() method instead and doing the $ins_val filling like this:
$ins_vals[] = $x;
$ins_vals[] = $y;
and using beginTransaction and commit makes now my script unbelievable fast.
Now it takes only about 1 minute to add my 100.000 points.
i think both array_merge and that "ugly" looping through the PDOStatement slowed down my script.
And why the heck someone downvoted my question? Are you punishing me because of my missing knowledge? Thanks.
Ok i generated a class where i set the sql and then put the values for each row with a method call. Whenever it reaches a specific limit, it starts a transaction, prepares the statement with as many placeholders as i have put values, then executes it with the array having all the values, then commit.
This seems to be fast enough, at least it doesnt get slower anymore.
For some reason its faster to add values in a flat structure as Daniel suggested. Thats enough for me.
Sometimes its good to have a function doing one step of insertion, because when the function returns, all the memory used in the function will be freed, so your memory usage stays low.
I've got this server setting a live traffic log DB that holds a big stats table. Now I need to create a smaller table from it, let's say 30 days back.
This server also has a slave server that copies the data and is 5 sec behind the master.
I created this slave in order to reduce server process for selecting queries so it only works with insert/update for the traffic log.
Now I need to copy the last day to the smaller table, and still not to use the "real" DB,
so I need to select from the slave and insert to the real smaller table. (The slave only allows read operations).
I am working with PHP and I can't solve this with one query using two different databases at one query... If it's possible, please let me know how?
When using two queries I need to hold the last day as a PHP MySQL object. For 300K-650K of rows, it's starting to be a cache memory problem. I would use a partial select by ID(by setting the ids at the where term) chunks but I don't have an auto increment id field and there's no id for the rows (when storing traffic data id would take a lot of space).
So I am trying this idea and I would like to get a second opinion.
If I will take the last day at once (300K rows) it will overload the PHP memory.
I can use limit chunks, or a new idea: selecting one column at a time and copying this one to the new real table. But I don't know if the second method is possible. Does insert looks at the first open space at a column level or row level?
the main idea is reducing the size of the select.. so is it possible to build a select by columns and then insert them as columns at mysql?
If this is simply a memory problem in PHP you could try using PDO and fetching 1 result row at a time instead of a all at the same time.
From PHP.net for PDO:
<?php
function getFruit($conn) {
$sql = 'SELECT name, color, calories FROM fruit ORDER BY name';
foreach ($conn->query($sql) as $row) {
print $row['name'] . "\t";
print $row['color'] . "\t";
print $row['calories'] . "\n";
}
}
?>
well here is where php start to be weird.. i took your advice and started to use chunks for the data. i used a loop for advancing a limit in 2000 rows jumps. but what was interesting is when i started to use php memory usage and memory peak functions i found out that the reason the chunks method doesn't work in large scales and looping is because setting a new value to a var doesn't release the memory of what was before the new setting.. so you must use unset or null in order to keep your memory at php, –
The situation is something like the following:
1- MySQL InnoDB table undergo to transactional select as follows:
<?php
....
doQuery('START TRANSACTION');
$sql = "SELECT * FROM table where amount < 10 FOR UPDATE";
$res = $doQuery($sql);
// Then a looping through $res includes updates to some fields -amount field- in the same table and set it to values greater than 10
//After the loop
doQuery('COMMIT');
At the XAMPP localhost, I opened two different browsers' windows, FireFox and Opera, requesting The script URL at the same time. I expect that only one of them could able to retrieve values for $res. However, The script returns Fetal Error
Blockquote
Fatal error: Maximum execution time of 30 seconds exceeded
I need to know the cause of this Error? Does it due to the two clients, FireFox and Opera, don't able to select or because they are not able to update?
Also I need a solution that keep transaction and give me the expected result, i.e. only one browser can return results!
you could just add set_time_limit(0); at the top of the script but it's not a good solution for scripts accessible via http.
Your script enters a dead lock. To avoid this, add an ORDER BY to the query, to ensure that both queries will try to select the records in the same order. Also make sure there is index on amount, otherwise the query will have to lock the entire table.
I want to use pdo in my application, but before that I want to understand how internally
PDOStatement->fetch and PDOStatement->fetchAll.
For my application, I want to do something like "SELECT * FROM myTable" and insert into csv file and it has around 90000 rows of data.
My question is, if I use PDOStatement->fetch as I am using it here:
// First, prepare the statement, using placeholders
$query = "SELECT * FROM tableName";
$stmt = $this->connection->prepare($query);
// Execute the statement
$stmt->execute();
var_dump($stmt->fetch(PDO::FETCH_ASSOC));
while ($row = $stmt->fetch(PDO::FETCH_ASSOC))
{
echo "Hi";
// Export every row to a file
fputcsv($data, $row);
}
Will after every fetch from database, result for that fetch would be store in memory ?
Meaning when I do second fetch, memory would have data of first fetch as well as data for second fetch.
And so if I have 90000 rows of data and if am doing fetch every time than memory is being updated to take new fetch result without removing results from previous fetch and so for the last fetch memory would already have 89999 rows of data.
Is this how PDOStatement::fetch
works ?
Performance wise how does this stack
up against PDOStatement::fetchAll ?
Update: Something about fetch and fetchAll from memory usage point of view
Just wanted to added some thing to this question as recently found something regarding fetch and fetchAll, hope this would make this question worthwhile for people would visit this question in future to get some understanding on fetch and fetchAll parameters.
fetch does not store information in memory and it works on row to row basis, so it would go through the result set and return row 1, than again would go to the result set and than again return row 2 mind here that it will not return row 1 as well as 2 but would only return row 2, so fetch will not store anything into memory but fetchAll will store details into the memories. So fetch is better option compared to fetchAll if we are dealing with an resultant set of around 100K in size.
PHP generally keeps its results on the server. It all depends on the driver. MySQL can be used in an "unbuffered" mode, but it's a tad tricky to use. fetchAll() on a large result set can cause network flooding, memory exhaustion, etc.
In every case where I need to process more than 1,000 rows, I'm not using PHP. Consider also if your database engine already has a CSV export operation. Many do.
I advice you to use PDO::FETCH_LAZY instead of PDO::FETCH_ASSOC for big data.
I used it for export to csv row by row and it works fine.
Without any "out of memory" errors.