Optimizing Execution Time with PHP - php

I'm processing roughly not less than 25,000 records. However, it somehow exceed the maximum execution time. I am using Codeigniter 3.0.
The Records was the text data from a PDF that was processed by the library I made. It seemed to be not exceeding the execution time if I only show it, but when things starts to be complicated, like processing it to the database(MySQL), it exceeds the 300sec(I reconfigured this) execution time.
To illustrate
function process() {
$data = processThePDF(); //outputs the records / 25,000 records
if ($data) {
foreach ($data as $dt) {
$info = $this->just_another_model->view($dt['id']); //get the old record
if ($info) {
//update
$this->just_another_model->update([$params]);
//Log Update
$this->just_another_model->log([$params]);
} else {
//Register
$this->just_another_model->update([$params]);
//Log Register
$this->just_another_model->log([$params]);
}
}
}
}
So my questions are:
1. Is there a better way to optimize this?
2. Is convenient to write a json file or a text file before processing it?

Store your data in an array and update/insert it by batch.
function process() {
$data = processThePDF(); //outputs the records / 25,000 records
$update_data = array();
$update_data_2 = array();
if ($data) {
foreach ($data as $dt) {
$info = $this->just_another_model->view($dt['id']); //get the old record
if ($info) {
$update_data[] = array($params);
} else {
$update_data_2[] = array($params);
}
}
if(count(update_data) > 0)
$this->just_another_model->update_batch($update_data);
if(count(update_data_2) > 0)
$this->just_another_model_2->update_batch($update_data_2);
}
Furthermore, you can get your old records by batch before the loop and format it into array that so you can access it by id: $old_records[$dt['id']]

Related

How can I keep track of record number across multiple json objects in PHP

I have an export of Customer Records that needed to be split up over several chunks of 500 records. I grab each chunk through a REST request, save it to my server:
public function createImportFile($json)
{
$filePath = storage_path().'/import/'.$this->getImportFileName($this->import->chunkNumber);
$importFile = fopen($filePath, 'w');
$array = json_decode($json);
fwrite($importFile, $json);
fclose($importFile);
return $filePath;
}
Then after grabbing all of the chunks, I import all of the records. I'm wondering what the best way would be to find the Nth record among all the chunks?
Currently, I divide the record number that I'm looking for by the total number of chunks to find out which chunk the record will be in. Then, I get the total records for the previous chunks and subtract this number from the record number to get the record's position in the chunk.
while ($this->recordNumber <= $this->totalRecords) {
$item = $this->getRecord($this->recordNumber);
if (empty($item)) {
$this->recordNumber++;
continue;
}
$results = $this->translateItem($item);
$this->recordNumber++;
}
public function getRecord($recordNumber)
{
if ($this->import->isChunkedImport()) {
$chunkNumber = (integer) $this->returnChunkFromRecordNumber($recordNumber);
$countInPrevChunks = intval($this->returnRecordCountForPrevChunks($chunkNumber));
$chunkPosition = intval($this->getChunkPosition($recordNumber, $countInPrevChunks));
$jsonObj = $this->getJsonObjectForChunkNumer($chunkNumber);
return $jsonObj[$chunkPosition];
}
else {
$chunkPosition = $this->getChunkPosition($recordNumber, 0);
$filePath = storage_path().'/import/'.$this->getImportFileName();
return (array) json_decode(file_get_contents($filePath))[$chunkPosition];
}
}
private function &getJsonObjectForChunkNumer($chunkNumber)
{
if ($this->currentFileArray == null || ($chunkNumber != $this->lastChunkNumber)) {
$filePath = storage_path().'/import/'.$this->getImportFileName($chunkNumber);
$this->currentFileArray = json_decode(file_get_contents($filePath), true);
$this->lastChunkNumber = $chunkNumber;
}
return $this->currentFileArray;
}
public function getChunkCount()
{
$filePath = storage_path().'/import/'.$this->getImportFileName();
return count(json_decode(file_get_contents($filePath)));
}
public function returnChunkFromRecordNumber($recordNumber)
{
if ($recordNumber >= $this->getChunkCount()) {
if (is_int($recordNumber/$this->getChunkCount())) {
if (($recordNumber/$this->getChunkCount()) == 1) {
return intval(1);
}
return intval(($recordNumber/$this->getChunkCount())-1);
}
else {
return intval($recordNumber/$this->getChunkCount());
}
}
else {
return intval(0);
}
}
public function getChunkPosition($recordNumber, $countInPrevChunks)
{
$positionInChunk = $recordNumber - $countInPrevChunks;
if ($positionInChunk == 0) {
return $positionInChunk;
}
return $positionInChunk - 1;
}
public function returnRecordCountForPrevChunks($chunkNumber)
{
if ($chunkNumber == 0) {
return 0;
}
else {
return $this->getChunkCount() * $chunkNumber;
I try to account for the first key for both Chunks and Records in the Chunks being 0, but I'm still missing the last record of the import. It also seems like I might be making this more complicated than it needs to be. I was wondering if anyone had advice or a more simple way to grab the Nth record. I thought about possibly just numbering the records as I bring them in with the REST request, then I could find the Chunk containing the record number as an array key and then return that record:
public function createImportFile($json)
{
$filePath = storage_path().'/import/'.$this->getImportFileName($this->import->chunkNumber);
$importFile = fopen($filePath, 'w');
if ($this->import->chunkNumber == 0 && $this->recordNumber == 0) $this->recordNumber = 1;
$array = json_decode($json);
$ordered_array = [];
foreach ($array as $record) {
$ordered_array[$this->recordNumber] = $record;
$this->recordNumber++;
}
fwrite($importFile, json_encode($ordered_array));
fclose($importFile);
return $filePath;
}
But I wasn't sure if that was the best approach.
With a lot of records, you could use a database table. MySQL would easily handle tens of thousands of records. You wouldn't even need to store the whole records. Perhaps just:
record_no | chunk_no | position_in_chunk
record_no: Primary key. Unique identifier for this record
chunk_no: Which chunk contains the record
position_in_chunk: Where within the chunk is the record located
Put a UNIQUE(chunk_no, position_in_chunk) index on the table.
Then as you pull records, assign them a number, build up the DB table, and save the table as you write records to disk. In the future, to get a specific record, all you'll need is its number.
If you don't want to use a database, you can also store this data as a JSON file, though retrieval performance will suffer from having to open and parse a big JSON file each time.

Parallel execution instead of foreach

I currently do file upload sequentially with a foreach loop.Each upload is processed one after the other.
<?php
foreach ($files_array as $file) {
//each image is processed here and upload one after the other
}
?>
However processing multiple images instead of one at a time would be more efficient as the user will wait a lot less.How can i process multiple files at once in php instead of doing it sequentially with a foreach
Option 1 - fork your process:
You need to fork your process, and fulfill this in few threads. Here my example:
<?php
declare(ticks = 1);
$filesArray = [
'file-0',
'file-1',
'file-2',
'file-3',
'file-4',
'file-5',
'file-6',
'file-7',
'file-8',
'file-9',
];
$maxThreads = 3;
$child = 0;
pcntl_signal(SIGCHLD, function ($signo) {
global $child;
if ($signo === SIGCLD) {
while (($pid = pcntl_wait($signo, WNOHANG)) > 0) {
$signal = pcntl_wexitstatus($signo);
$child--;
}
}
});
foreach ($filesArray as $item) {
while ($child >= $maxThreads) {
sleep(1);
}
$child++;
$pid = pcntl_fork();
if ($pid) {
} else {
// Here your stuff.
sleep(2);
echo posix_getpid()." - $item \n";
exit(0);
}
}
while ($child != 0) {
sleep(3);
}
Option 2 - use work queues:
You can also use queue (for example RabbitMQ or something else).
In your script you can put your job into queue and reply to client that job added to queue and will processed soon. Here you can find detailed example how you can done it with RabbitMQ.
You want to upload multiple images at a time ?
If yes so use the "explode" method to upload multiple images.

How to handle pcntl_fork(): Error 35?

I have php7 CLI daemon which serially parses json with filesize over 50M. I'm trying to save every 1000 entries of parsed data using a separate process with pcntl_fork() to mysql, and for ~200k rows it works fine.
Then I get pcntl_fork(): Error 35.
I assume this is happening because mysql insertion becomes slower than parsing, which causes more and more forks to be generated until CentOS 6.3 can't handle it any more.
Is there a way to catch this error to resort to single-process parsing and saving? Or is there a way to check child process count?
Here is the solution that I did based on #Sander Visser comment. Key part is checking existing processes and resorting to same process if there are too many of them
class serialJsonReader{
const MAX_CHILD_PROCESSES = 50;
private $child_processes=[]; //will store alive child PIDs
private function flushCachedDataToStore() {
//resort to single process
if (count($this->child_processes) > self::MAX_CHILD_PROCESSES) {
$this->checkChildProcesses();
$this->storeCollectedData() //main work here
}
//use as much as possible
else {
$pid = pcntl_fork();
if (!$pid) {
$this->storeCollectedData(); //main work here
exit();
}
elseif ($pid == -1) {
die('could not fork');
}
else {
$this->child_processes[] = $pid;
$this->checkChildProcesses();
}
}
}
private function checkChildProcesses() {
if (count($this->child_processes) > self::MAX_CHILD_PROCESSES) {
foreach ($this->child_processes as $key => $pid) {
$res = pcntl_waitpid($pid, $status, WNOHANG);
// If the process has already exited
if ($res == -1 || $res > 0) {
unset($this->child_processes[$key]);
}
}
}
}
}

Large processes in PhalconPHP

I have webapp that is logging application and I need backup/restore/import/export feature there. I did this successfully with laravel but have some complications with Phalcon. I don't see native functions in phalcon that would split on chunks execution of large php scripts.
The thing is that logs will be backed up and restored as well as imported by users in ADIF format (adif.org) I have parser for that format which converts file to array of arrays then every record should search through another table, containing 2000 regular expressions, and find 3-10 matches there and connect imported records in one table to those in another table (model relation hasMany) That means that every imported record should have quite some processing time. laravel did it somehow with 3500 records imported, I dont know how it will handle more. The average import will contain 10000 records and each of them need to be verified with 2000 regular expression.
The main issue is how to split this huge processing mount into smaller chunks so I wouldnt get timeouts?
Here is the function that could flawlessly do the job with adding 3862 records in one table and as a result of processing of every record add 8119 records in another table:
public function restoreAction()
{
$this->view->disable();
$user = Users::findFirst($this->session->auth['id']);
if ($this->request->isPost()) {
if ($this->request->isAjax()) {
$frontCache = new CacheData(array(
"lifetime" => 21600
));
$cache = new CacheFile($frontCache, array(
"cacheDir" => "../plc/app/cache/"
));
$cacheKey = $this->request->getPost('fileName').'.cache';
$records = $cache->get($cacheKey);
if ($records === null) {
$rowsPerChunk = 50;
$adifSource = AdifHelper::parseFile(BASE_URL.'/uploads/'.$user->getUsername().'/'.$this->request->getPost('fileName'));
$records = array_chunk($adifSource, $rowsPerChunk);
$key = array_keys($records);
$size = count($key);
}
for ($i = 0; $i < $size; $i++) {
if (!isset($records[$i])) {
break;
}
set_time_limit(50);
for ($j=0; $j < $rowsPerChunk; $j++) {
$result = $records[$i][$j];
if (!isset($result)) {
break;
}
if(isset($result['call'])) {
$p = new PrefixHelper($result['call']);
}
$bandId = (isset($result['band']) && (strlen($result['band']) > 2)) ? Bands::findFirstByName($result['band'])->getId() : null;
$infos = (isset($p)) ? $p->prefixInfo() : null;
if (is_array($infos)) {
if (isset($result['qsl_sent']) && ($result['qsl_sent'] == 'q')) {
$qsl_rcvd = 'R';
} else if (isset($result['eqsl_qsl_sent']) && ($result['eqsl_qsl_sent'] == 'c')) {
$qsl_rcvd = 'y';
} else if (isset($result['qsl_rcvd'])) {
$qsl_rcvd = $result['qsl_rcvd'];
} else {
$qsl_rcvd ='i';
}
$logRow = new Logs();
$logRow->setCall($result['call']);
$logRow->setDatetime(date('Y-m-d H:i:s',strtotime($result['qso_date'].' '.$result['time_on'])));
$logRow->setFreq(isset($result['freq']) ? $result['freq'] : 0);
$logRow->setRst($result['rst_sent']);
$logRow->setQslnote(isset($result['qslmsg']) ? $result['qslmsg'] : '');
$logRow->setComment(isset($result['comment']) ? $result['comment'] : '');
$logRow->setQslRcvd($qsl_rcvd);
$logRow->setQslVia(isset($result['qsl_sent_via']) ? $result['qsl_sent_via'] : 'e');
$logRow->band_id = $bandId;
$logRow->user_id = $this->session->auth['id'];
$success = $logRow->save();
if ($success) {
foreach ($infos as $info) {
if (is_object($info)) {
$inf = new Infos();
$inf->setLat($info->lat);
$inf->setLon($info->lon);
$inf->setCq($info->cq);
$inf->setItu($info->itu);
if (isset($result['iota'])) {
$inf->setIota($result['iota']);
}
if (isset($result['pfx'])) {
$inf->setPfx($result['pfx']);
}
if (isset($result['gridsquare'])) {
$inf->setGrid($result['gridsquare']);
} else if (isset($result['grid'])) {
$inf->setGrid($result['grid']);
}
$inf->qso_id = $logRow->getId();
$inf->prefix_id = $info->id;
$infSuccess[] = $inf->save();
}
}
}
}
}
sleep(1);
}
}
}
}
I know, the script needs a lot of improvement but for now the task was just to make it work.
I think that the good practice for large processing task in php is console applications, that doesn't have restrictions in execution time and can be setup with more memory for execution.
As for phalcon, it has builtin mechanism for running and processing cli tasks - Command Line Applications (this link will always point to the documentation of a phalcon latest version)

PHP Script runs slow, but server CPU is idle

I have a PHP script to import various data from text files.
The import is very complex and my test file has 32.000 entrys. These entrys have to be parsed and inserted into a mysql database.
If i will run my script it needs 30 minutes to finish... And at this time my server cpu is 99% idle.
Is there a chance to optimize php and mysql that they are using more power from the machine?
code:
if ($handle = #fopen($filename, "r")) {
$canceled = false;
$item = null;
$result = null;
while (!feof($handle)) {
$buffer = fgets($handle, 4096);
$line = $buffer;
if (substr($line, 0, 2) == '00') {
continue;
}
else if (substr($line, 0, 2) == '99') {
continue;
}
else if (strlen($line) < 75) {
continue;
}
Reread:
if ($canceled) {
break;
}
if ($item == null) {
$item = new DA11_Item();
$item->boq_id = $boq_id;
}
$result = $this->add_line_into_item($item, $line);
if ($result == self::RESULT_CLOSED) {
$this->add_item($item);
$item = null;
}
else if ($result == self::RESULT_REREAD) {
$this->add_item($item);
$item = null;
goto Reread;
}
else if ($result == self::RESULT_IGNORD) {
if (count($item->details()) > 0) {
$this->add_item($item);
}
$item = null;
}
}
if ($item !== NULL) {
$this->add_item($item);
}
fclose($handle);
}
add_item will perform a $item->save() and saves it to the database.
thx and kind regards,
viperneo
One problem you have is that every single insert is a separate request to your db-server including it's response. With 32.000 records you maybe get an idea, that this is a quite huge overhead. Use bulk inserts for (lets say) 1000 records at once
INSERT INTO foo (col1, col2) VALUES
(1,'2'),
(3,'4')
-- 997 additional rows
(1999, '2000');
Additional Transactions may help
Update, because you mentioned active-record: I recommend to avoid any additional abstraction layer for such mass import tasks.
As Florent says in the comment, the MySQL engine is too slow to handle the requests. Moving the MySQL database to a SSD (instead of a HDD) will improve speed significantly.
There are some things that you can optimize.
But i think the most performance take your MySQL Database. Sometimes its possible that some indexes on your joined tables bring a lot of speed. You should check that at first.
When you have a linux system you could use mysqltuner to optimize your database.
The second way is to optimize your code and cache some results from your database when you've done the same before.

Categories