Yii large SQL queries consumes a large amount of memory - php

I am using Yii 1.1.14 with php 5.3 on centos 6 and I am using CDbCommand to fetch data from a very large table, the result set is ~90,000 records over 10 columns I am exporting it to a csv file and the file size is about 15MB,
the script always crashed without any error messages and only after some research I figured out that I need to raise the memory_limit in php.ini in order to be able to execute the script successfully.
The only problem is that for a successful execution I had to raise the memory limit to 512MB(!) which is a lot! and if 10 users will be executing the same script my server will not respond very well...
I was wondering if anyone might know of a way to reduce memory consumption on sql queries with Yii?
I know I can split the query to multiple queries using limits and offsets, but it just doesn't seem logical that a 15MB query will consume 512MB.
Here is the code:
set_time_limit(0);
$connection = new CDbConnection($dsn,$username,$password);
$command = $connection->createCommand('SELECT * FROM TEST_DATA');
$result = $command->queryAll(); //this is where the script crashes
print_r($result);
Any ideas would be greatly appreciated!
Thanks,

Instead of using readAll that will returns all the rows in a single array (the real memory problem is here), you should simply use a foreach loop (take a look at CDbDataReader), e.g. :
$command = $connection->createCommand('SELECT * FROM TEST_DATA');
$rows = $command->query();
foreach ($rows as $row)
{
}
EDIT : Using LIMIT
$count = Yii::app()->db->createCommand('SELECT COUNT(*) FROM TEST_DATA')->queryScalar();
$maxRows = 1000:
$maxPages = ceil($count / $maxRows);
for ($i=0;$i<$maxPages;$i++)
{
$offset = $i * $maxRows;
$rows = $connection->createCommand("SELECT * FROM TEST_DATA LIMIT $offset,$maxRows")->query();
foreach ($rows as $row)
{
// Here your code
}
}

Related

Overcoming PHP Memory usage issues for array functions

I have this PHP code below to generate a set of 12 digit unique random numbers(ranging from 100000 to a million) and save it in db. I am first fetching the existing codes from MySQL db(right now there are already a million of them), flipping the array, generating a new codes. Later I use array_diff_key and array_keys on $random and $existingRandom to get the new codes which are to be saved back to db.
// total labels is the number of new codes to generate
//$totalLabels is fetched as user input and could range from 100000 to a million
$codeObject = new Codes();
//fetch existing codes from db
$codesfromDB = $codeObject->getAllCodes();
$existingRandom = $random = array();
$existingRandom = $random = array_flip($codesfromDB);
$existingCount = count($random); //The codes you already have
do {
$random[mt_rand(100000000000,999999999999)] = 1;
} while ((count($random)-$existingCount) < $totalLabels);
$newCodes = array_diff_key($random,$existingRandom);
$newCodes = array_keys($newCodes);
The issue I am facing is that the array_flip function is running out of memory and causing my program to crash Error
"Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 72 bytes)"
My questions are below:
1) Can someone help me understand why the array_flip is running out of memory. Memory limit in php.ini file is 256M. Please show me calculation of the memory used by the function if possible. (Also if array_flip passes array_diff_key and array_keys run out of memory)
2) How do I optimize the code so that the memory used is under the limit. I even tried to break the array_flip operation in smaller chunks but even that is running out of memory.
$size = 5000;
$array_chunk = array_chunk($codesfromDB, $size);
foreach($array_chunk as $values){
$existingRandom[] = $random[] = array_flip($values);
}
3) Is what I am doing optimal would it be fair to further increase the memory limit in php.ini file. What are the things to keep in mind while doing that.
Here is my query as well to fetch the existing codes from db if needed:
$sql = "SELECT codes FROM code";
$stmt = $this->db->prepare($sql);
$stmt->execute();
$result = $stmt->fetchAll(PDO::FETCH_COLUMN, 0);
return $result;

Retrieve all rows from table in doctrine

I have table with 100 000+ rows, and I want to select all of it in doctrine and to do some actions with each row, in symfony2 with doctrine I try to do with this query:
$query = $this->getDefaultEntityManager()
->getRepository('AppBundle:Contractor')
->createQueryBuilder('c')
->getQuery()->iterate();
foreach ($query as $contractor) {
// doing something
}
but then I get memory leak, because I think It wrote all data in memory.
I have more experience in ADOdb, in that library when I do so:
$result = $ADOdbObject->Execute('SELECT * FROM contractors');
while ($arrRow = $result->fetchRow()) {
// do some action
}
I do not get any memory leak.
So how to select all data from the table and do not get memory leak with doctrine in symfony2 ?
Question EDIT
When I try to delete foreach and just do iterate, I also get memory leak:
$query = $this->getDefaultEntityManager()
->getRepository('AppBundle:Contractor')
->createQueryBuilder('c')
->getQuery()->iterate();
The normal approach is to use iterate().
$q = $this->getDefaultEntityManager()->createQuery('select u from AppBundle:Contractor c');
$iterableResult = $q->iterate();
foreach ($iterableResult as $row) {
// do something
}
However, as the doctrine documentation says this can still result in errors.
Results may be fully buffered by the database client/ connection allocating additional memory not visible to the PHP process. For large sets this may easily kill the process for no apparant reason.
The easiest approach to this would be to simply create smaller queries with offsets and limits.
//get the count of the whole query first
$qb = $this->getDefaultEntityManager();
$qb->select('COUNT(u)')->from('AppBundle:Contractor', 'c');
$count = $qb->getQuery()->getSingleScalarResult();
//lets say we go in steps of 1000 to have no memory leak
$limit = 1000;
$offset = 0;
//loop every 1000 > create a query > loop the result > repeat
while ($offset < $count){
$qb->select('u')
->from('AppBundle:Contractor', 'c')
->setMaxResults($limit)
->setFirstResult($offset);
$result = $qb->getQuery()->getResult();
foreach ($result as $contractor) {
// do something
}
$offset += $limit;
}
With this heavy datasets this will most likely go over the maximum execution time, which is 30 seconds by default. So make sure to manually change set_time_limit in your php.ini. If you just want to update all datasets with a known pattern, you should consider writing one big update query instead of looping and editing the result in PHP.
Try using this approach:
foreach ($query as $contractor) {
// doing something
$this->getDefaultEntityManager()->detach($contractor);
$this->getDefaultEntityManager()->clear($contractor);
unset($contractor); // tell to the gc the object is not in use anymore
}
Hope this help
If you really need to get all the records, I'd suggest you to use database_connection directly. Look at its interface and choose method which won't load all the data into memory (and won't map the records to your entity).
You could use something like this (assuming this code is in controller):
$db = $this->get('database_connection');
$query = 'select * from <your_table>';
$sth = $db->prepare($query);
$sth->execute();
while($row = $sth->fetch()) {
// some stuff
}
Probably it's not what you need because you might want to have objects after handling all the collection. But maybe you don't need the objects. Anyway think about this.

PHP PDO SQL MultiThreading Possible?

I send a FULL flat (EDI) file to OXFORD everyday. I would query my database, get the array of results, and use the data array to make a csv file out of it
$sql = "SELECT * FROM master_table";
$sth = $apex->prepare($sql);
$sth->execute();
$result = $sth->fetchAll(PDO::FETCH_ASSOC); //this is where I would get my fatal error due to out of memory
$csv = new csv();
$csv->makeCsv($result);
The $result would contain an array of millions of records which I then make into a csv.
The main problem I'm having here is a lack of memory and time. However if I "break up" the sql query like this:
$years = [2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011]; //etc
foreach ($years as year) {
$sql = "SELECT * FROM master_table WHERE year = $year";
$sth = $apex->prepare($sql);
$sth->execute();
$result = $sth->fetchAll(PDO::FETCH_ASSOC);
$csv = new csv();
$csv->makeCsv($result);
unset($result);
}
This works but it takes a extremely long time due to php not being unable to execute the foreach loop simultaneously. Is there an option in PDO where I can execute multithreaded queries?
Even if you could select the data faster from multiple threads (which you can't most probably), you couldn't write the CSV any faster whatever ... threading doesn't make sense here.
The best solution is to SELECT INTO OUTFILE, bypassing PHP in the creation of the CSV altogether.

Overcoming PHP memory exhausted or execution time error when retrieving MySQL table

I have a big table in my MySQL database. I want to go over one of it's column and pass it in a function to see if it exist in another table and if not create it there.
However, I always face either a memory exhausted or execution time error.
//Get my table
$records = DB::($table)->get();
//Check to see if it's fit my condition
foreach($records as $record){
Check_for_criteria($record['columnB']);
}
However, when I do that, I get a memory exhausted error.
So I tried with a for statement
//Get min and max id
$min = \DB::table($table)->min('id');
$max = \DB::table($table)->max('id');
//for loop to avoid memory problem
for($i = $min; $i<=$max; $i++){
$record = \DB::table($table)->where('id',$i)->first();
//To convert in array for the purpose of the check_for_criteria function
$record= get_object_vars($record);
Check_for_criteria($record['columnB']);
}
But going this way, I got a maximum execution time error.
FYI the check_for_criteria function is something like
check_for_criteria($record){
$user = User::where('record', $record)->first();
if(is_null($user)){
$nuser = new User;
$nuser->number = $record;
$nuser->save();
}
}
I know I could ini_set('memory_limit', -1); but I would rather find a way to limit my memory usage in some way or at least spreading it some way.
Should I run these operations in background when traffic is low? Any other suggestion?
I solved my problem by limiting my request to distinct values in ColumnB.
//Get my table
$records = DB::($table)->distinct()->select('ColumnB')->get();
//Check to see if it's fit my condition
foreach($records as $record){
Check_for_criteria($record['columnB']);
}

PHP/MYSQL: Iterate over every record in a database

I am new to the whole php/mysql thing. I have a weeks worth of server logs (about 300,000 items) and I need to do some analysis. I am planning on reading them all into a mysql db and then analysing them with php.
The thing I am not sure about is how to iterate through them. Using java reading a file I would do something like this:
Scanner s = new Scanner(myfile);
while(s.hasNext()){
String line = s.nextLine();
~~ Do something with this record.
}
How do I iterate through all records in a mysql db using php? I think that something like this will take a stupid amount of memory.
$query = "SELECT * FROM mytable";
$result = mysql_query($query);
$rows = mysql_num_rows($result);
for($j = 0; $j < $rows; ++$j){
$curIndex = mysql_result($result,$j,"index");
$curURL = mysql_result($result,$j,"something");
~~ Do something with this record
}
So I have added a limit to the select statement and I repeat until all records have been cycled through. Is there a more standard way to do this? Is there a built in that will do this?
while($startIndex < $numberOfRows){
$query = "SELECT * FROM mytable ORDERBY mytable.index LIMIT $startIndex,$endIndex";
$result = mysql_query($query);
$rows = mysql_num_rows($result);
for($j = 0; $j < $rows; ++$j){
$curIndex = mysql_result($result,$j,"index");
$curURL = mysql_result($result,$j,"something");
~~ Do something with this record
}
$startIndex = $endIndex + 1;
$endIndex = $endIndes + 10;
}
You don't want to do a SELECT * FROM MYTABLE if your table is large, you're going to have the whole thing in memory. A trade-off between memory overhead and database calls would be to batch requests. You can get the min and max id's of rows in your table:
SELECT MIN(ID) FROM MYTABLE;
SELECT MAX(ID) FROM MYTABLE;
Now loop from minId to maxId, incrementing by say 10,000 each time. In pseudo-code:
for (int i = minId; i < maxId; i = i + 10000) {
int x = i;
int y = i + 10000;
SELECT * FROM MYTABLE WHERE ID >= x AND ID < y;
}
See here:
http://www.tizag.com/mysqlTutorial/
http://www.tizag.com/mysqlTutorial/mysqlfetcharray.php
<?php
// Make a MySQL Connection
$query = "SELECT * FROM example";
$result = mysql_query($query) or die(mysql_error());
while($row = mysql_fetch_array($result)){
echo $row['name']. " - ". $row['age'];
echo "<br />";
}
?>
Depending on what you need to do with the resulting rows, you can use a different loops style, whether its 'while', 'for each' or 'for x to x'. Most of the time, a simple 'while' iteration will be great, and is efficient.
Use mysql_fetch_*
$result = mysql_query(...);
while($row = mysql_fetch_assoc($result)) {
$curIndex = $row['index'];
}
I think that retrieves results in a "streaming" manner, rather than loading them all into memory at once. I'm not sure what exactly mysql_result does.
Side note: Since you're still new, I'd advice to get into good habits right away and immediately skip the mysql_ functions and go for PDO or at least mysqli.
In an ideal world, PHP would generate aggregate queries, send them to MySQL, and only get a small number of rows in return. For instance, if you're counting the number of log items of each severity between two dates:
SELECT COUNT(*), severity
FROM logs
WHERE date < ? AND date > ?
GROUP BY severity
Doing the work on the PHP side is quite unusual. If you find out that you have needs too complex for SQL queries to handle (which, given that you have control over your database structure, leaves you with a lot of freedom), a better option would be to move to a Map-Reduce database engine like CouchDB.
I strongly believe the batch processing with Doctrine or any kind of iterations with MySQL (PDO or mysqli) are just an illusion.
#dimitri-k provided a nice explanation especially about unit of work. The problem is the miss leading: "$query->iterate()" which doesn't really iterate over the data source. It's just an \Traversable wrapper around already fully fetched data source.
An example demonstrating that even removing Doctrine abstraction layer completely from the picture, we will still run into memory issues:
echo 'Starting with memory usage: ' . memory_get_usage(true) / 1024 / 1024 . " MB \n";
$pdo = new \PDO("mysql:dbname=DBNAME;host=HOST", "USER", "PW");
$stmt = $pdo->prepare('SELECT * FROM my_big_table LIMIT 100000');
$stmt->execute();
while ($rawCampaign = $stmt->fetch()) {
// echo $rawCampaign['id'] . "\n";
}
echo 'Ending with memory usage: ' . memory_get_usage(true) / 1024 / 1024 . " MB \n";
Output:
Starting with memory usage: 6 MB
Ending with memory usage: 109.46875 MB
Here, the disappointing getIterator() method:
namespace Doctrine\DBAL\Driver\Mysqli\MysqliStatement
/**
* {#inheritdoc}
*/
public function getIterator()
{
$data = $this->fetchAll();
return new \ArrayIterator($data);
}
You can use my little library to actually stream heavy tables using PHP Doctrine or DQL or just pure SQL. However you find appropriate: https://github.com/EnchanterIO/remote-collection-stream

Categories