Retrieve all rows from table in doctrine - php

I have table with 100 000+ rows, and I want to select all of it in doctrine and to do some actions with each row, in symfony2 with doctrine I try to do with this query:
$query = $this->getDefaultEntityManager()
->getRepository('AppBundle:Contractor')
->createQueryBuilder('c')
->getQuery()->iterate();
foreach ($query as $contractor) {
// doing something
}
but then I get memory leak, because I think It wrote all data in memory.
I have more experience in ADOdb, in that library when I do so:
$result = $ADOdbObject->Execute('SELECT * FROM contractors');
while ($arrRow = $result->fetchRow()) {
// do some action
}
I do not get any memory leak.
So how to select all data from the table and do not get memory leak with doctrine in symfony2 ?
Question EDIT
When I try to delete foreach and just do iterate, I also get memory leak:
$query = $this->getDefaultEntityManager()
->getRepository('AppBundle:Contractor')
->createQueryBuilder('c')
->getQuery()->iterate();

The normal approach is to use iterate().
$q = $this->getDefaultEntityManager()->createQuery('select u from AppBundle:Contractor c');
$iterableResult = $q->iterate();
foreach ($iterableResult as $row) {
// do something
}
However, as the doctrine documentation says this can still result in errors.
Results may be fully buffered by the database client/ connection allocating additional memory not visible to the PHP process. For large sets this may easily kill the process for no apparant reason.
The easiest approach to this would be to simply create smaller queries with offsets and limits.
//get the count of the whole query first
$qb = $this->getDefaultEntityManager();
$qb->select('COUNT(u)')->from('AppBundle:Contractor', 'c');
$count = $qb->getQuery()->getSingleScalarResult();
//lets say we go in steps of 1000 to have no memory leak
$limit = 1000;
$offset = 0;
//loop every 1000 > create a query > loop the result > repeat
while ($offset < $count){
$qb->select('u')
->from('AppBundle:Contractor', 'c')
->setMaxResults($limit)
->setFirstResult($offset);
$result = $qb->getQuery()->getResult();
foreach ($result as $contractor) {
// do something
}
$offset += $limit;
}
With this heavy datasets this will most likely go over the maximum execution time, which is 30 seconds by default. So make sure to manually change set_time_limit in your php.ini. If you just want to update all datasets with a known pattern, you should consider writing one big update query instead of looping and editing the result in PHP.

Try using this approach:
foreach ($query as $contractor) {
// doing something
$this->getDefaultEntityManager()->detach($contractor);
$this->getDefaultEntityManager()->clear($contractor);
unset($contractor); // tell to the gc the object is not in use anymore
}
Hope this help

If you really need to get all the records, I'd suggest you to use database_connection directly. Look at its interface and choose method which won't load all the data into memory (and won't map the records to your entity).
You could use something like this (assuming this code is in controller):
$db = $this->get('database_connection');
$query = 'select * from <your_table>';
$sth = $db->prepare($query);
$sth->execute();
while($row = $sth->fetch()) {
// some stuff
}
Probably it's not what you need because you might want to have objects after handling all the collection. But maybe you don't need the objects. Anyway think about this.

Related

Advice on returning results from a large query returning error- Allowed memory size of 734003200 bytes exhausted

I'm trying to return the result of 3 tables being joined together for a user to download as CSV, and this is throwing the error:
Allowed memory size of 734003200 bytes exhausted
This is the query being run:
SELECT *
FROM `tblProgram`
JOIN `tblPlots` ON `tblPlots`.`programID`=`tblProgram`.`pkProgramID`
JOIN `tblTrees` ON `tblTrees`.`treePlotID`=`tblPlots`.`id`
The line of code causing the error is this:
$resultsALL=$this->db->query($fullQry);
Where $fullQry is the query shown above. When I comment out that single line, everything runs without the error. So I'm certain its not an infinite loop somewhere I'm missing.
I'm wondering how do I break up the query so that I can get the results without erroring out? The tables only have a relatively small amount of data in them right now and will be even larger eventually, so I don't think raising the memory size is a good option.
I'm using CodeIgniter/php/mysql. I can provide more code if need be...
Thank you for any direction you can advise!
Based off of: MySQL : retrieve a large select by chunks
You may also try retrieving the data in chunks by using the LIMIT clause.
Since you're using CodeIgniter 3, here is how you can go about it.
You may need to pass a different $orderBy argument#6 to the getChunk(...) method if in case your joined tables have conflicting id column names.
I.e: $this->getChunk(..., ..., ..., 0, 2000, "tblProgram.id");
Solution:
<?php
class Csv_model extends CI_Model
{
public function __construct()
{
parent::__construct();
$this->load->database();
}
public function index()
{
$sql = <<< END
SELECT *
FROM `tblProgram`
JOIN `tblPlots` ON `tblPlots`.`programID`=`tblProgram`.`pkProgramID`
JOIN `tblTrees` ON `tblTrees`.`treePlotID`=`tblPlots`.`id`
END;
$this->getChunk(function (array $chunk) {
/*
* Do something with each chunk here;
* Do something with each chunk here;
* log_message('error', json_encode($chunk));
* */
}, $this->db, $sql);
}
/*
* Processes a raw SQL query result in chunks sending each chunk to the provided callback function.
* */
function getChunk(callable $callback, $DBContext, string $rawSQL = "SELECT 1", int $initialRowOffset = 0, int $maxRows = 2000, string $orderBy = "id")
{
$DBContext->query('DROP TEMPORARY TABLE IF EXISTS chunkable');
$DBContext->query("CREATE TEMPORARY TABLE chunkable AS ( $rawSQL ORDER BY `$orderBy` )");
do {
$constrainedSQL = sprintf("SELECT * FROM chunkable ORDER BY `$orderBy` LIMIT %d, %d", $initialRowOffset, $maxRows);
$queryBuilder = $DBContext->query($constrainedSQL);
$callback($queryBuilder->result_array());
$initialRowOffset = $initialRowOffset + $maxRows;
} while ($queryBuilder->num_rows() === $maxRows);
}
}
Use getUnbufferedRow() for processing large result sets.
getUnbufferedRow()
This method returns a single result row without prefetching the whole
result in memory as row() does. If your query has more than one row,
it returns the current row and moves the internal data pointer ahead.
$query = $db->query("YOUR QUERY");
while ($row = $query->getUnbufferedRow()) {
echo $row->title;
echo $row->name;
echo $row->body;
}
For use with MySQLi you may set MySQLi’s result mode to
MYSQLI_USE_RESULT for maximum memory savings. Use of this is not
generally recommended but it can be beneficial in some circumstances
such as writing large queries to csv. If you change the result mode be
aware of the tradeoffs associated with it.
$db->resultMode = MYSQLI_USE_RESULT; // for unbuffered results
$query = $db->query("YOUR QUERY");
$file = new \CodeIgniter\Files\File(WRITEPATH.'data.csv');
$csv = $file->openFile('w');
while ($row = $query->getUnbufferedRow('array'))
{
$csv->fputcsv($row);
}
$db->resultMode = MYSQLI_STORE_RESULT; // return to default mode
Note:
When using MYSQLI_USE_RESULT all subsequent calls on the same
connection will result in error until all records have been fetched or
a freeResult() call has been made. The getNumRows() method will
only return the number of rows based on the current position of the
data pointer. MyISAM tables will remain locked until all the records
have been fetched or a freeResult() call has been made.
You can optionally pass ‘object’ (default) or ‘array’ in order to
specify the returned value’s type:
$query->getUnbufferedRow(); // object
$query->getUnbufferedRow('object'); // object
$query->getUnbufferedRow('array'); // associative array
freeResult()
It frees the memory associated with the result and deletes the result
resource ID. Normally PHP frees its memory automatically at the end of
script execution. However, if you are running a lot of queries in a
particular script you might want to free the result after each query
result has been generated in order to cut down on memory consumption.
$query = $thisdb->query('SELECT title FROM my_table');
foreach ($query->getResult() as $row) {
echo $row->title;
}
$query->freeResult(); // The $query result object will no longer be available
$query2 = $db->query('SELECT name FROM some_table');
$row = $query2->getRow();
echo $row->name;
$query2->freeResult(); // The $query2 result object will no longer be available

Which is better for performance for sql row returns: inline vs function call

I have a sql query that returns thousands of rows and this query is done on multiple pages. I was wondering if I could send the row returns individually to a function to build the return to make reusable code for maintainability. I know this could lead to overhead problems. Most likely the cap will be 5000 rows.
Will this compound on our server end with 10,000 users?
Any suggestions on a better way to do this?
Example:
if($result = $conn->query($sql)){
while( $row = $result->fetch_assoc()){
array_push($data, returnFunction($row));
}
}
function returnFunction($row) {
$dataTemp = new stdClass();
$dataTemp->image = $row["image"];
return $dataTemp;
}

What is faster in PHP/MYSQL, NTH term query in a for loop, or a IN query after pushing an array of ids after the initial query?

I'm wondering whether this kind of logic would improve query performance, say for example rather then checking a user likes a post on each element in an array and firing a query for each.
Instead i could push the primary id's into an array and then perform an IN query on them, this would reduce 15 nth term queries, and batch it into 2 query including the initial one.
I'm using PHP PDO, MYSQL.
Any advice? Am i on the right track people? :D
$items is the result set from the database, in this case they are questions that users are asking, i get a response in about 140ms and i've set a limit on how many items are loaded at once with pagination.
$questionIds = [];
foreach ($items as $item) {
array_push($questionIds, $item->question_id);
}
$items = loggedInUserLikesQuestions($questionIds, $items, $user_id);
Definitely the IN clause is faster on execution of the SQL query. However, you will only see significant actual clock-speed benefits once the number of items in your IN clause (on average) gets high.
The reason there is a speed difference, even though the individual update may be lightning-fast, is the setup, executing, tear-down, and response of each query, send/receive to the server. When you are doing thousands (or millions) of these as fast as you can, I've seen, instead of 500/sec, getting 200,000/sec. This may give you some idea.
However, with the IN-clause method, you need to make sure your IN clause does not become too big, and hitting the max query size (see variable max_allowed_packet)
Here is a simple set of functions that will automatically batch up into IN clauses of 1000 items each:
<?php
$db = new PDO('...');
$__q = [];
$flushQueue = function() use ($db, &$__q) {
if ( count($__q) > 0 ) {
$sanitized_ids = [];
foreach ( $__q as $id ) { $sanitized_ids[] = (int) $id; }
$db->query("UPDATE question SET linked = 1 WHERE id IN (". join(',',$sanitized_ids) .")");
$__q = [];
}
};
$queuedUpdate = function($question_id) use (&$__q, $flushQueue){
$__q[] = $question_id;
if ( count( $__q) > 1000 ) { $flushQueue(); }
};
// Then your code...
foreach ($items as $item) {
$queuedUpdate($item->question_id);
}
$flushQueue();
Obviously, you don't have to use anon functions, if you are in a class. But the above will work anywhere (assuming you are on >= PHP 5.3).

Yii large SQL queries consumes a large amount of memory

I am using Yii 1.1.14 with php 5.3 on centos 6 and I am using CDbCommand to fetch data from a very large table, the result set is ~90,000 records over 10 columns I am exporting it to a csv file and the file size is about 15MB,
the script always crashed without any error messages and only after some research I figured out that I need to raise the memory_limit in php.ini in order to be able to execute the script successfully.
The only problem is that for a successful execution I had to raise the memory limit to 512MB(!) which is a lot! and if 10 users will be executing the same script my server will not respond very well...
I was wondering if anyone might know of a way to reduce memory consumption on sql queries with Yii?
I know I can split the query to multiple queries using limits and offsets, but it just doesn't seem logical that a 15MB query will consume 512MB.
Here is the code:
set_time_limit(0);
$connection = new CDbConnection($dsn,$username,$password);
$command = $connection->createCommand('SELECT * FROM TEST_DATA');
$result = $command->queryAll(); //this is where the script crashes
print_r($result);
Any ideas would be greatly appreciated!
Thanks,
Instead of using readAll that will returns all the rows in a single array (the real memory problem is here), you should simply use a foreach loop (take a look at CDbDataReader), e.g. :
$command = $connection->createCommand('SELECT * FROM TEST_DATA');
$rows = $command->query();
foreach ($rows as $row)
{
}
EDIT : Using LIMIT
$count = Yii::app()->db->createCommand('SELECT COUNT(*) FROM TEST_DATA')->queryScalar();
$maxRows = 1000:
$maxPages = ceil($count / $maxRows);
for ($i=0;$i<$maxPages;$i++)
{
$offset = $i * $maxRows;
$rows = $connection->createCommand("SELECT * FROM TEST_DATA LIMIT $offset,$maxRows")->query();
foreach ($rows as $row)
{
// Here your code
}
}

PHP/MYSQL: Iterate over every record in a database

I am new to the whole php/mysql thing. I have a weeks worth of server logs (about 300,000 items) and I need to do some analysis. I am planning on reading them all into a mysql db and then analysing them with php.
The thing I am not sure about is how to iterate through them. Using java reading a file I would do something like this:
Scanner s = new Scanner(myfile);
while(s.hasNext()){
String line = s.nextLine();
~~ Do something with this record.
}
How do I iterate through all records in a mysql db using php? I think that something like this will take a stupid amount of memory.
$query = "SELECT * FROM mytable";
$result = mysql_query($query);
$rows = mysql_num_rows($result);
for($j = 0; $j < $rows; ++$j){
$curIndex = mysql_result($result,$j,"index");
$curURL = mysql_result($result,$j,"something");
~~ Do something with this record
}
So I have added a limit to the select statement and I repeat until all records have been cycled through. Is there a more standard way to do this? Is there a built in that will do this?
while($startIndex < $numberOfRows){
$query = "SELECT * FROM mytable ORDERBY mytable.index LIMIT $startIndex,$endIndex";
$result = mysql_query($query);
$rows = mysql_num_rows($result);
for($j = 0; $j < $rows; ++$j){
$curIndex = mysql_result($result,$j,"index");
$curURL = mysql_result($result,$j,"something");
~~ Do something with this record
}
$startIndex = $endIndex + 1;
$endIndex = $endIndes + 10;
}
You don't want to do a SELECT * FROM MYTABLE if your table is large, you're going to have the whole thing in memory. A trade-off between memory overhead and database calls would be to batch requests. You can get the min and max id's of rows in your table:
SELECT MIN(ID) FROM MYTABLE;
SELECT MAX(ID) FROM MYTABLE;
Now loop from minId to maxId, incrementing by say 10,000 each time. In pseudo-code:
for (int i = minId; i < maxId; i = i + 10000) {
int x = i;
int y = i + 10000;
SELECT * FROM MYTABLE WHERE ID >= x AND ID < y;
}
See here:
http://www.tizag.com/mysqlTutorial/
http://www.tizag.com/mysqlTutorial/mysqlfetcharray.php
<?php
// Make a MySQL Connection
$query = "SELECT * FROM example";
$result = mysql_query($query) or die(mysql_error());
while($row = mysql_fetch_array($result)){
echo $row['name']. " - ". $row['age'];
echo "<br />";
}
?>
Depending on what you need to do with the resulting rows, you can use a different loops style, whether its 'while', 'for each' or 'for x to x'. Most of the time, a simple 'while' iteration will be great, and is efficient.
Use mysql_fetch_*
$result = mysql_query(...);
while($row = mysql_fetch_assoc($result)) {
$curIndex = $row['index'];
}
I think that retrieves results in a "streaming" manner, rather than loading them all into memory at once. I'm not sure what exactly mysql_result does.
Side note: Since you're still new, I'd advice to get into good habits right away and immediately skip the mysql_ functions and go for PDO or at least mysqli.
In an ideal world, PHP would generate aggregate queries, send them to MySQL, and only get a small number of rows in return. For instance, if you're counting the number of log items of each severity between two dates:
SELECT COUNT(*), severity
FROM logs
WHERE date < ? AND date > ?
GROUP BY severity
Doing the work on the PHP side is quite unusual. If you find out that you have needs too complex for SQL queries to handle (which, given that you have control over your database structure, leaves you with a lot of freedom), a better option would be to move to a Map-Reduce database engine like CouchDB.
I strongly believe the batch processing with Doctrine or any kind of iterations with MySQL (PDO or mysqli) are just an illusion.
#dimitri-k provided a nice explanation especially about unit of work. The problem is the miss leading: "$query->iterate()" which doesn't really iterate over the data source. It's just an \Traversable wrapper around already fully fetched data source.
An example demonstrating that even removing Doctrine abstraction layer completely from the picture, we will still run into memory issues:
echo 'Starting with memory usage: ' . memory_get_usage(true) / 1024 / 1024 . " MB \n";
$pdo = new \PDO("mysql:dbname=DBNAME;host=HOST", "USER", "PW");
$stmt = $pdo->prepare('SELECT * FROM my_big_table LIMIT 100000');
$stmt->execute();
while ($rawCampaign = $stmt->fetch()) {
// echo $rawCampaign['id'] . "\n";
}
echo 'Ending with memory usage: ' . memory_get_usage(true) / 1024 / 1024 . " MB \n";
Output:
Starting with memory usage: 6 MB
Ending with memory usage: 109.46875 MB
Here, the disappointing getIterator() method:
namespace Doctrine\DBAL\Driver\Mysqli\MysqliStatement
/**
* {#inheritdoc}
*/
public function getIterator()
{
$data = $this->fetchAll();
return new \ArrayIterator($data);
}
You can use my little library to actually stream heavy tables using PHP Doctrine or DQL or just pure SQL. However you find appropriate: https://github.com/EnchanterIO/remote-collection-stream

Categories