PHP array inserting / manipulation degrading over iterations - php

I am in the process of transferring data from one database to another. They are different dbs (mssql to mysql) so I cant do direct queries and am using PHP as an intermediary. Consider the following code. For some reason, each time it goes through the while loop it takes twice as much time as the time before.
$continue = true;
$limit = 20000;
while($continue){
$i = 0;
$imp->endTimer();
$imp->startTimer("Fetching Apps");
$qry = "THIS IS A BASIC SELECT QUERY";
$data = $imp->src->dbQuery($qry, array(), PDO::FETCH_ASSOC);
$inserts = array();
$continue = (count($data) == $limit);
$imp->endTimer();
$imp->startTimer("Processing Apps " . memory_get_usage() );
if($data == false){
$continue = false;
}
else{
foreach($data AS $row){
// THERE IS SOME EXTREMELY BASIC IF STATEMENTS HERE
$inserts[] = array(
"paymentID"=>$paymentID,
"ticketID"=>$ticketID,
"applicationLink"=>$row{'ApplicationID'},
"paymentLink"=>(int)($paymentLink),
"ticketLink"=>(int)($ticketLink),
"dateApplied"=>$row{'AddDate'},
"appliedBy"=>$adderID,
"appliedAmount"=>$amount,
"officeID"=>$imp->officeID,
"customerID"=>-1,
"taxCollected"=>0
);
$i++;
$minID = $row{'ApplicationID'};
}
}
$imp->endTimer();
$imp->startTimer("Inserting $i Apps");
if(count($inserts) > 0){
$imp->dest->dbBulkInsert("appliedPayments", $inserts);
}
unset($data);
unset($inserts);
echo "Inserted $i Apps<BR>";
}
It doesn't matter what I set the limit to, the processing portion takes twice as long each time. I am logging each portion of the loop and selecting the data from the old database and inserting it into the new one take no time at all. The "processing portion" is doubling every time. Why? Here are the logs, if you do some quick math on the timestamps, each step labeled "Processing Apps" takes twice as long as the one before... (I stopped it a little early on this one, but it was taking a significantly longer time on the final iteration)

Well - so I don't know why this works, but if I move everything inside the while loop into a separate function, it DRAMATICALLY increases performance. Im guessing its a garbage collection / memory management issue and that having a function call end helps the Garbage collector know it can release the memory. Now when I log the memory usage, the memory usage stays constant between calls instead of growing... Dirty php...

Related

Maximum time execution CodeIgniter 3 issue

I got that the only solution to avoid the Maximum time execution CodeIgniter 3 issue is to increase the time execution from 30 to 300 for example.
I'm using CodeIgniter in a news website. I'm loading only 20 latest news in the news section page and I think that it's not a big number to make the server out of execution time. (Notice that the news table has more than 1400 news and the seen table has more than 150.000 logs).
I say that it's not logical that the user should wait for more than 50 seconds to get the respond and load the page.## Heading ##
Is there any useful way to load the page as fast as possible without "maximum time execution"?
My Code in the model:
public function get_section_news($id_section = 0, $length = 0, $id_sub_section = 0, $id_news_lessthan = 0) {
$arr = [] or array();
//
if (intval($id_section) > 0 and intval($length) > 0) {
//
$where = [] or array();
$where['sections.activity'] = 1;
$where['news.deleted'] = 0;
$where['news.id_section'] = $id_section;
$query = $this->db;
$query
->from("news")
->join("sections", "news.id_section = sections.id_section", "inner")
->order_by("news.id_news", "desc")
->limit($length);
//
if (intval($id_sub_section) > 0) {
$where['news.id_section_sub'] = $id_sub_section;
}
if ($id_news_lessthan > 0) {
$where['news.id_news <'] = $id_news_lessthan;
}
//
$get = $query->where($where)->get();
$num = $get->num_rows();
if ($num > 0) {
//
foreach ($get->result() as $key => $value) {
$arr['row'][] = $value;
}
}
$arr['is_there_more'] = ($length > $num and $num > 0) ? true : false;
}
return $arr;
}
This usually has nothing to do with the framework. You may run the following command on your mysql client and check if there are any sleeping queries on your database.
SHOW FULL PROCESSLIST
most likely you have sleeping queries since you are not emptying result set with
$get->free_result();
Another problem may be slow queries on this I recommend the following
1) make sure you are using the same database engine on all tables for this I recommend INNODB as some engines lock the whole table during a transaction which is undesirable You should have noticed this already when you ran show full processlist
2) Run your queries on a mysql client and observe how long they will take to execute. If they take too long it may be a result of unindexed tables. You may Explain your query to identify unindexed tables. You may follow these 1,2,3 tutorials on indexing your tables. Or you can do it easily with tools like navicat

Retrieve all rows from table in doctrine

I have table with 100 000+ rows, and I want to select all of it in doctrine and to do some actions with each row, in symfony2 with doctrine I try to do with this query:
$query = $this->getDefaultEntityManager()
->getRepository('AppBundle:Contractor')
->createQueryBuilder('c')
->getQuery()->iterate();
foreach ($query as $contractor) {
// doing something
}
but then I get memory leak, because I think It wrote all data in memory.
I have more experience in ADOdb, in that library when I do so:
$result = $ADOdbObject->Execute('SELECT * FROM contractors');
while ($arrRow = $result->fetchRow()) {
// do some action
}
I do not get any memory leak.
So how to select all data from the table and do not get memory leak with doctrine in symfony2 ?
Question EDIT
When I try to delete foreach and just do iterate, I also get memory leak:
$query = $this->getDefaultEntityManager()
->getRepository('AppBundle:Contractor')
->createQueryBuilder('c')
->getQuery()->iterate();
The normal approach is to use iterate().
$q = $this->getDefaultEntityManager()->createQuery('select u from AppBundle:Contractor c');
$iterableResult = $q->iterate();
foreach ($iterableResult as $row) {
// do something
}
However, as the doctrine documentation says this can still result in errors.
Results may be fully buffered by the database client/ connection allocating additional memory not visible to the PHP process. For large sets this may easily kill the process for no apparant reason.
The easiest approach to this would be to simply create smaller queries with offsets and limits.
//get the count of the whole query first
$qb = $this->getDefaultEntityManager();
$qb->select('COUNT(u)')->from('AppBundle:Contractor', 'c');
$count = $qb->getQuery()->getSingleScalarResult();
//lets say we go in steps of 1000 to have no memory leak
$limit = 1000;
$offset = 0;
//loop every 1000 > create a query > loop the result > repeat
while ($offset < $count){
$qb->select('u')
->from('AppBundle:Contractor', 'c')
->setMaxResults($limit)
->setFirstResult($offset);
$result = $qb->getQuery()->getResult();
foreach ($result as $contractor) {
// do something
}
$offset += $limit;
}
With this heavy datasets this will most likely go over the maximum execution time, which is 30 seconds by default. So make sure to manually change set_time_limit in your php.ini. If you just want to update all datasets with a known pattern, you should consider writing one big update query instead of looping and editing the result in PHP.
Try using this approach:
foreach ($query as $contractor) {
// doing something
$this->getDefaultEntityManager()->detach($contractor);
$this->getDefaultEntityManager()->clear($contractor);
unset($contractor); // tell to the gc the object is not in use anymore
}
Hope this help
If you really need to get all the records, I'd suggest you to use database_connection directly. Look at its interface and choose method which won't load all the data into memory (and won't map the records to your entity).
You could use something like this (assuming this code is in controller):
$db = $this->get('database_connection');
$query = 'select * from <your_table>';
$sth = $db->prepare($query);
$sth->execute();
while($row = $sth->fetch()) {
// some stuff
}
Probably it's not what you need because you might want to have objects after handling all the collection. But maybe you don't need the objects. Anyway think about this.

Overcoming PHP memory exhausted or execution time error when retrieving MySQL table

I have a big table in my MySQL database. I want to go over one of it's column and pass it in a function to see if it exist in another table and if not create it there.
However, I always face either a memory exhausted or execution time error.
//Get my table
$records = DB::($table)->get();
//Check to see if it's fit my condition
foreach($records as $record){
Check_for_criteria($record['columnB']);
}
However, when I do that, I get a memory exhausted error.
So I tried with a for statement
//Get min and max id
$min = \DB::table($table)->min('id');
$max = \DB::table($table)->max('id');
//for loop to avoid memory problem
for($i = $min; $i<=$max; $i++){
$record = \DB::table($table)->where('id',$i)->first();
//To convert in array for the purpose of the check_for_criteria function
$record= get_object_vars($record);
Check_for_criteria($record['columnB']);
}
But going this way, I got a maximum execution time error.
FYI the check_for_criteria function is something like
check_for_criteria($record){
$user = User::where('record', $record)->first();
if(is_null($user)){
$nuser = new User;
$nuser->number = $record;
$nuser->save();
}
}
I know I could ini_set('memory_limit', -1); but I would rather find a way to limit my memory usage in some way or at least spreading it some way.
Should I run these operations in background when traffic is low? Any other suggestion?
I solved my problem by limiting my request to distinct values in ColumnB.
//Get my table
$records = DB::($table)->distinct()->select('ColumnB')->get();
//Check to see if it's fit my condition
foreach($records as $record){
Check_for_criteria($record['columnB']);
}

how to cleanup / free database query memory in zend?

After executing this simple code (for MySQL database) I get 1kB of memory less for each loop iteration, so after 1000'th iteration I have about 1MB memory used.
Now, if I have to loop in a long running script (about 1 000 000 iterations) I will be out of memory quickly
$_db = Zend_Db_Table::getDefaultAdapter();
$start_memory = memory_get_usage();
for ($i=0; $i<1000; $i++) {
$update_query = "UPDATE table SET field='value'";
$_db->query($update_query);
}
echo 'memory used: '.(memory_get_usage()-$start_memory);
Is there a way to free memory used by database query?
I tried to put update query in a function so after leaving function scope resources used by this function should be freed automaticaly:
function update($_db) {
$sql = "UPDATE table SET field='value'";
$_db->query($sql);
}
...
for ($i=0; $i<1000; $i++) {
update($_db);
}
but they are not!
I'm not interested in advices like 'try updating mutliple rows in one go' ;)
Most probably you have the Zend_Db_Profiler enabled.
The database profiler stores each executed query which is very useful for debugging and optimisation but leads to rather fast memory exhaustion if you execute a huge numbers of queries.
In the example you gave, disabling the profiler should do the trick:
$_db = Zend_Db_Table::getDefaultAdapter();
$_db->getProfiler()->setEnabled(false);
$start_memory = memory_get_usage();
for ($i=0; $i<1000; $i++) {
$update_query = "UPDATE table SET field='value'";
$_db->query($update_query);
}
echo 'memory used: '.(memory_get_usage()-$start_memory);
When executing the same query multiple times the best way to save memory is to implement prepared statements. Your adapter is going to be using prepared statements, but since you are calling the query() method inside the loop, it's getting prepared every time. Move that outside of the loop:
$_db = Zend_Db_Table::getDefaultAdapter();
$_stm = $_db->prepare("UPDATE table SET field='?'");
for ($i=0; $i<1000; $i++) {
$_stm->execute(array($fieldValue));
}

This code needs to loop over 3.5 million rows, how can I make it more efficient?

I have a csv file that has 3.5 million codes in it.
I should point out that this is only EVER going to be this once.
The csv looks like
age9tlg,
rigfh34,
...
Here is my code:
ini_set('max_execution_time', 600);
ini_set("memory_limit", "512M");
$file_handle = fopen("Weekly.csv", "r");
while (!feof($file_handle)) {
$line_of_text = fgetcsv($file_handle);
if (is_array($line_of_text))
foreach ($line_of_text as $col) {
if (!empty($col)) {
mysql_query("insert into `action_6_weekly` Values('$col', '')") or die(mysql_error());
}
} else {
if (!empty($line_of_text)) {
mysql_query("insert into `action_6_weekly` Values('$line_of_text', '')") or die(mysql_error());
}
}
}
fclose($file_handle);
Is this code going to die part way through on me?
Will my memory and max execution time be high enough?
NB:
This code will be run on my localhost, and the database is on the same PC, so latency is not an issue.
Update:
here is another possible implementation.
This one does it in bulk inserts of 2000 records
$file_handle = fopen("Weekly.csv", "r");
$i = 0;
$vals = array();
while (!feof($file_handle)) {
$line_of_text = fgetcsv($file_handle);
if (is_array($line_of_text))
foreach ($line_of_text as $col) {
if (!empty($col)) {
if ($i < 2000) {
$vals[] = "('$col', '')";
$i++;
} else {
$vals = implode(', ', $vals);
mysql_query("insert into `action_6_weekly` Values $vals") or die(mysql_error());
$vals = array();
$i = 0;
}
}
} else {
if (!empty($line_of_text)) {
if ($i < 2000) {
$vals[] = "('$line_of_text', '')";
$i++;
} else {
$vals = implode(', ', $vals);
mysql_query("insert into `action_6_weekly` Values $vals") or die(mysql_error());
$vals = array();
$i = 0;
}
}
}
}
fclose($file_handle);
if i was to use this method what is the highest value i could set it to insert at once?
Update 2
so, ive found i can use
LOAD DATA LOCAL INFILE 'C:\\xampp\\htdocs\\weekly.csv' INTO TABLE `action_6_weekly` FIELDS TERMINATED BY ';' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY ','(`code`)
but the issue now is that, i was wrong about the csv format,
it is actually 4 codes and then a line break,
so
fhroflg,qporlfg,vcalpfx,rplfigc,
vapworf,flofigx,apqoeei,clxosrc,
...
so i need to be able to specify two LINES TERMINATED BY
this question has been branched out to Here.
Update 3
Setting it to do bulk inserts of 20k rows, using
while (!feof($file_handle)) {
$val[] = fgetcsv($file_handle);
$i++;
if($i == 20000) {
//do insert
//set $i = 0;
//$val = array();
}
}
//do insert(for last few rows that dont reach 20k
but it dies at this point because for some reason $val contains 75k rows, and idea why?
note the above code is simplified.
I doubt this will be the popular answer, but I would have your php application run mysqlimport on the csv file. Surely it is optimized far beyond what you will do in php.
is this code going to die part way
through on me? will my memory and max
execution time be high enough?
Why don't you try and find out?
You can adjust both the memory (memory_limit) and execution time (max_execution_time) limits, so if you really have to use that, it shouldn't be a problem.
Note that MySQL supports delayed and multiple row insertion:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
http://dev.mysql.com/doc/refman/5.1/en/insert.html
make sure there are no indexes on your table, as indexes will slow down inserts (add the indexes after you've done all the inserts)
rather than create a new SQL statement in each call of the loop try and Prepare the SQL statement outside of the loop, and Execute that prepared statement with parameters inside the loop. Depending on the database this can be heaps faster.
I've done the above when importing a large Access database into Postgres using perl and got the insert time down to 30 seconds. I would have used an importer tool, but I wanted perl to enforce some rules when inserting.
You should accumulate the values and insert them into the database all at once at the end, or in batches every x records. Doing a single query for each row means 3.5 million SQL queries, each carrying quite some overhead.
Also, you should run this on the command line, where you won't need to worry about execution time limits.
The real answer though is evilclown's answer, importing to MySQL from CSV is already a solved problem.
I hope there is not a web client waiting for a response on this. Other than calling the import utility already referenced, I would start this as a job and return feedback to the client almost immediately. Have the insert loop update a percentage-complete somewhere so the end user can check the status, if you absolutely must do it this way.
2 possible ways.
1) Batch the process, then have a scheduled job import the file, while updating a status. This way, you can have a page that keeps checking the status and refresh itself if the status is not yet 100%. Users will have a live update of how much has been done. But for this you need to access to the OS to be able to set up the schedule task. And the task will be running idle when there is nothing to import.
2) Have the page handle 1000 rows (or any N number of rows... you decide), then send a java script to the browser to refresh itself with a new parameter to tell the script to handle the next 1000 rows. You can also display a status to the user while this is happening. Only problem is that if the page somehow does nor refresh, then the import stops.

Categories