I have a PHP script to import various data from text files.
The import is very complex and my test file has 32.000 entrys. These entrys have to be parsed and inserted into a mysql database.
If i will run my script it needs 30 minutes to finish... And at this time my server cpu is 99% idle.
Is there a chance to optimize php and mysql that they are using more power from the machine?
code:
if ($handle = #fopen($filename, "r")) {
$canceled = false;
$item = null;
$result = null;
while (!feof($handle)) {
$buffer = fgets($handle, 4096);
$line = $buffer;
if (substr($line, 0, 2) == '00') {
continue;
}
else if (substr($line, 0, 2) == '99') {
continue;
}
else if (strlen($line) < 75) {
continue;
}
Reread:
if ($canceled) {
break;
}
if ($item == null) {
$item = new DA11_Item();
$item->boq_id = $boq_id;
}
$result = $this->add_line_into_item($item, $line);
if ($result == self::RESULT_CLOSED) {
$this->add_item($item);
$item = null;
}
else if ($result == self::RESULT_REREAD) {
$this->add_item($item);
$item = null;
goto Reread;
}
else if ($result == self::RESULT_IGNORD) {
if (count($item->details()) > 0) {
$this->add_item($item);
}
$item = null;
}
}
if ($item !== NULL) {
$this->add_item($item);
}
fclose($handle);
}
add_item will perform a $item->save() and saves it to the database.
thx and kind regards,
viperneo
One problem you have is that every single insert is a separate request to your db-server including it's response. With 32.000 records you maybe get an idea, that this is a quite huge overhead. Use bulk inserts for (lets say) 1000 records at once
INSERT INTO foo (col1, col2) VALUES
(1,'2'),
(3,'4')
-- 997 additional rows
(1999, '2000');
Additional Transactions may help
Update, because you mentioned active-record: I recommend to avoid any additional abstraction layer for such mass import tasks.
As Florent says in the comment, the MySQL engine is too slow to handle the requests. Moving the MySQL database to a SSD (instead of a HDD) will improve speed significantly.
There are some things that you can optimize.
But i think the most performance take your MySQL Database. Sometimes its possible that some indexes on your joined tables bring a lot of speed. You should check that at first.
When you have a linux system you could use mysqltuner to optimize your database.
The second way is to optimize your code and cache some results from your database when you've done the same before.
Related
I've run the following PHP function on two different machines and consistently seen little difference in runtime - Ex. 40 sec vs 40.5 sec. Why is this?
The Two Different Builds:
2.3Ghz 8-Core Intel Core I9, 16GB RAM
3.8Ghz 12-Core Ryzen 9 3900x, 32GB RAM
I know that PHP runs scripts on a single core, but I'm unsure as to why the difference in clock speeds between the two processors accounted for no difference in runtime.
The Function:
This function takes a parent product, looks through its components for either a part or assembly - if a part - recursion, if not add to global variable. I know it's bad practice to use global variables, but in my application this makes sense for me.
<?php
$masterPickList = [];
function listBuilder($itemNo, $multiplier){
global $masterPickList;
include "bin/inventoryConn.php";
$assemblies = [];
$AssemblySelect = "SELECT products.PRD_COMP_ITEM_NO,products.PRD_STR_QTY_PER_PAR, itemInfo.ITEM_DESC1,itemInfo.ITEM_DESC2,itemInfo.ITEM_P_AND_IC_CD,itemInfo.ITEM_USER_DEF_CD,itemInfo.ITEM_NO,itemInfo.ITEM_PUR_UOM
FROM products INNER JOIN itemInfo ON products.PRD_COMP_ITEM_NO = itemInfo.ITEM_NO WHERE products.PRD_STR_PAR_ITEM_NO = '{$itemNo}' ORDER BY PRD_COMP_ITEM_NO ASC;";
$AssemblyResult = mysqli_query($conn, $AssemblySelect) or die("Bad Query: $AssemblySelect");
while($row = mysqli_fetch_assoc($AssemblyResult))
{
if ($row['ITEM_P_AND_IC_CD'] !== '11' && $row['ITEM_P_AND_IC_CD'] !== '11A' && $row['ITEM_P_AND_IC_CD'] !== '90')
{
if (strpos($row['ITEM_NO'],'50-') !== false)
{
$masterPickList[$itemNo]['SCD'][$row['ITEM_NO']]['ITEM_NO'] = $row['ITEM_NO'];
$masterPickList[$itemNo]['SCD'][$row['ITEM_NO']]['ITEM_DESC1'] = $row['ITEM_DESC1'];
$masterPickList[$itemNo]['SCD'][$row['ITEM_NO']]['ITEM_DESC2'] = $row['ITEM_DESC2'];
$masterPickList[$itemNo]['SCD'][$row['ITEM_NO']]['QTY'] = intval($multiplier) * intval($row['PRD_STR_QTY_PER_PAR']);
if ($row['ITEM_USER_DEF_CD'] !== 'P')
{
$masterPickList[$itemNo]['SCD'][$row['ITEM_NO']]['PICK'] = '*Do Not Pick*';
} elseif ($row['ITEM_USER_DEF_CD'] == 'P')
{
$masterPickList[$itemNo]['SCD'][$row['ITEM_NO']]['PICK'] = '';
}
$masterPickList[$itemNo]['SCD'][$row['ITEM_NO']]['UM'] = $row['ITEM_PUR_UOM'];
}
else
{
$masterPickList[$itemNo]['COMP'][$row['ITEM_NO']]['ITEM_NO'] = $row['ITEM_NO'];
$masterPickList[$itemNo]['COMP'][$row['ITEM_NO']]['ITEM_DESC1'] = $row['ITEM_DESC1'];
$masterPickList[$itemNo]['COMP'][$row['ITEM_NO']]['ITEM_DESC2'] = $row['ITEM_DESC2'];
$masterPickList[$itemNo]['COMP'][$row['ITEM_NO']]['QTY'] = intval($multiplier) * intval($row['PRD_STR_QTY_PER_PAR']);
if ($row['ITEM_USER_DEF_CD'] !== 'P')
{
$masterPickList[$itemNo]['COMP'][$row['ITEM_NO']]['PICK'] = '*Do Not Pick*';
} elseif ($row['ITEM_USER_DEF_CD'] == 'P')
{
$masterPickList[$itemNo]['COMP'][$row['ITEM_NO']]['PICK'] = '';
}
$masterPickList[$itemNo]['COMP'][$row['ITEM_NO']]['UM'] = $row['ITEM_PUR_UOM'];
}
}
else if ($row['ITEM_P_AND_IC_CD'] == '11' || $row['ITEM_P_AND_IC_CD'] == '11A' || $row['ITEM_P_AND_IC_CD'] == '90')
{
$masterPickList[$itemNo]['COMP'][$row['ITEM_NO']]['ITEM_NO'] = $row['ITEM_NO'];
$masterPickList[$itemNo]['COMP'][$row['ITEM_NO']]['ITEM_DESC1'] = $row['ITEM_DESC1'];
$masterPickList[$itemNo]['COMP'][$row['ITEM_NO']]['ITEM_DESC2'] = $row['ITEM_DESC2'];
$masterPickList[$itemNo]['COMP'][$row['ITEM_NO']]['QTY'] = intval($multiplier) * intval($row['PRD_STR_QTY_PER_PAR']);
if ($row['ITEM_USER_DEF_CD'] !== 'P')
{
$masterPickList[$itemNo]['COMP'][$row['ITEM_NO']]['PICK'] = '*Do Not Pick*';
} elseif ($row['ITEM_USER_DEF_CD'] == 'P')
{
$masterPickList[$itemNo]['COMP'][$row['ITEM_NO']]['PICK'] = '';
}
$masterPickList[$itemNo]['COMP'][$row['ITEM_NO']]['UM'] = $row['ITEM_PUR_UOM'];
$assemblies[] = $row['ITEM_NO'];
}
}
foreach($assemblies as $item) {
listBuilder($item, $multiplier);
}
}
CPU speeds have not changed much since the year 2000. Before that, it was a useful factor in performance.
MySQL spends some of its time transferring data between processes with itself and do/from the clients. The performance of that, too, is not something to see much difference between two current servers or between an old server and a new one.
Almost everything is done serially, so multiple cores, drives, etc don't help when measuring a single task.
If everything is cached in RAM, the size of the RAM does not matter.
The only significant hardware improvement in the past decade is switching from HDD to SSD. Everything is noise.
I'm processing roughly not less than 25,000 records. However, it somehow exceed the maximum execution time. I am using Codeigniter 3.0.
The Records was the text data from a PDF that was processed by the library I made. It seemed to be not exceeding the execution time if I only show it, but when things starts to be complicated, like processing it to the database(MySQL), it exceeds the 300sec(I reconfigured this) execution time.
To illustrate
function process() {
$data = processThePDF(); //outputs the records / 25,000 records
if ($data) {
foreach ($data as $dt) {
$info = $this->just_another_model->view($dt['id']); //get the old record
if ($info) {
//update
$this->just_another_model->update([$params]);
//Log Update
$this->just_another_model->log([$params]);
} else {
//Register
$this->just_another_model->update([$params]);
//Log Register
$this->just_another_model->log([$params]);
}
}
}
}
So my questions are:
1. Is there a better way to optimize this?
2. Is convenient to write a json file or a text file before processing it?
Store your data in an array and update/insert it by batch.
function process() {
$data = processThePDF(); //outputs the records / 25,000 records
$update_data = array();
$update_data_2 = array();
if ($data) {
foreach ($data as $dt) {
$info = $this->just_another_model->view($dt['id']); //get the old record
if ($info) {
$update_data[] = array($params);
} else {
$update_data_2[] = array($params);
}
}
if(count(update_data) > 0)
$this->just_another_model->update_batch($update_data);
if(count(update_data_2) > 0)
$this->just_another_model_2->update_batch($update_data_2);
}
Furthermore, you can get your old records by batch before the loop and format it into array that so you can access it by id: $old_records[$dt['id']]
I have php7 CLI daemon which serially parses json with filesize over 50M. I'm trying to save every 1000 entries of parsed data using a separate process with pcntl_fork() to mysql, and for ~200k rows it works fine.
Then I get pcntl_fork(): Error 35.
I assume this is happening because mysql insertion becomes slower than parsing, which causes more and more forks to be generated until CentOS 6.3 can't handle it any more.
Is there a way to catch this error to resort to single-process parsing and saving? Or is there a way to check child process count?
Here is the solution that I did based on #Sander Visser comment. Key part is checking existing processes and resorting to same process if there are too many of them
class serialJsonReader{
const MAX_CHILD_PROCESSES = 50;
private $child_processes=[]; //will store alive child PIDs
private function flushCachedDataToStore() {
//resort to single process
if (count($this->child_processes) > self::MAX_CHILD_PROCESSES) {
$this->checkChildProcesses();
$this->storeCollectedData() //main work here
}
//use as much as possible
else {
$pid = pcntl_fork();
if (!$pid) {
$this->storeCollectedData(); //main work here
exit();
}
elseif ($pid == -1) {
die('could not fork');
}
else {
$this->child_processes[] = $pid;
$this->checkChildProcesses();
}
}
}
private function checkChildProcesses() {
if (count($this->child_processes) > self::MAX_CHILD_PROCESSES) {
foreach ($this->child_processes as $key => $pid) {
$res = pcntl_waitpid($pid, $status, WNOHANG);
// If the process has already exited
if ($res == -1 || $res > 0) {
unset($this->child_processes[$key]);
}
}
}
}
}
I have webapp that is logging application and I need backup/restore/import/export feature there. I did this successfully with laravel but have some complications with Phalcon. I don't see native functions in phalcon that would split on chunks execution of large php scripts.
The thing is that logs will be backed up and restored as well as imported by users in ADIF format (adif.org) I have parser for that format which converts file to array of arrays then every record should search through another table, containing 2000 regular expressions, and find 3-10 matches there and connect imported records in one table to those in another table (model relation hasMany) That means that every imported record should have quite some processing time. laravel did it somehow with 3500 records imported, I dont know how it will handle more. The average import will contain 10000 records and each of them need to be verified with 2000 regular expression.
The main issue is how to split this huge processing mount into smaller chunks so I wouldnt get timeouts?
Here is the function that could flawlessly do the job with adding 3862 records in one table and as a result of processing of every record add 8119 records in another table:
public function restoreAction()
{
$this->view->disable();
$user = Users::findFirst($this->session->auth['id']);
if ($this->request->isPost()) {
if ($this->request->isAjax()) {
$frontCache = new CacheData(array(
"lifetime" => 21600
));
$cache = new CacheFile($frontCache, array(
"cacheDir" => "../plc/app/cache/"
));
$cacheKey = $this->request->getPost('fileName').'.cache';
$records = $cache->get($cacheKey);
if ($records === null) {
$rowsPerChunk = 50;
$adifSource = AdifHelper::parseFile(BASE_URL.'/uploads/'.$user->getUsername().'/'.$this->request->getPost('fileName'));
$records = array_chunk($adifSource, $rowsPerChunk);
$key = array_keys($records);
$size = count($key);
}
for ($i = 0; $i < $size; $i++) {
if (!isset($records[$i])) {
break;
}
set_time_limit(50);
for ($j=0; $j < $rowsPerChunk; $j++) {
$result = $records[$i][$j];
if (!isset($result)) {
break;
}
if(isset($result['call'])) {
$p = new PrefixHelper($result['call']);
}
$bandId = (isset($result['band']) && (strlen($result['band']) > 2)) ? Bands::findFirstByName($result['band'])->getId() : null;
$infos = (isset($p)) ? $p->prefixInfo() : null;
if (is_array($infos)) {
if (isset($result['qsl_sent']) && ($result['qsl_sent'] == 'q')) {
$qsl_rcvd = 'R';
} else if (isset($result['eqsl_qsl_sent']) && ($result['eqsl_qsl_sent'] == 'c')) {
$qsl_rcvd = 'y';
} else if (isset($result['qsl_rcvd'])) {
$qsl_rcvd = $result['qsl_rcvd'];
} else {
$qsl_rcvd ='i';
}
$logRow = new Logs();
$logRow->setCall($result['call']);
$logRow->setDatetime(date('Y-m-d H:i:s',strtotime($result['qso_date'].' '.$result['time_on'])));
$logRow->setFreq(isset($result['freq']) ? $result['freq'] : 0);
$logRow->setRst($result['rst_sent']);
$logRow->setQslnote(isset($result['qslmsg']) ? $result['qslmsg'] : '');
$logRow->setComment(isset($result['comment']) ? $result['comment'] : '');
$logRow->setQslRcvd($qsl_rcvd);
$logRow->setQslVia(isset($result['qsl_sent_via']) ? $result['qsl_sent_via'] : 'e');
$logRow->band_id = $bandId;
$logRow->user_id = $this->session->auth['id'];
$success = $logRow->save();
if ($success) {
foreach ($infos as $info) {
if (is_object($info)) {
$inf = new Infos();
$inf->setLat($info->lat);
$inf->setLon($info->lon);
$inf->setCq($info->cq);
$inf->setItu($info->itu);
if (isset($result['iota'])) {
$inf->setIota($result['iota']);
}
if (isset($result['pfx'])) {
$inf->setPfx($result['pfx']);
}
if (isset($result['gridsquare'])) {
$inf->setGrid($result['gridsquare']);
} else if (isset($result['grid'])) {
$inf->setGrid($result['grid']);
}
$inf->qso_id = $logRow->getId();
$inf->prefix_id = $info->id;
$infSuccess[] = $inf->save();
}
}
}
}
}
sleep(1);
}
}
}
}
I know, the script needs a lot of improvement but for now the task was just to make it work.
I think that the good practice for large processing task in php is console applications, that doesn't have restrictions in execution time and can be setup with more memory for execution.
As for phalcon, it has builtin mechanism for running and processing cli tasks - Command Line Applications (this link will always point to the documentation of a phalcon latest version)
I have a function that will take an id and with that find out other information in the database relating to it spread among 3 tables. It then compares this to a csv file which at most times is cpu intensive. Running this once with one id takes approx 8 to 10 sec at most but I have been asked to have it run automatically across a varing number of ids in the database. To do this I created an array of the ids that match the criteria in the database at any point and then run a 'while' statement to repeat the function for each element in the array but it gets as far as maybe 4 of them and I get the following error:
Server error!
The server encountered an internal error and was unable to complete
your request. Either the server is overloaded or there was an error in
a CGI script.
If you think this is a server error, please contact the webmaster.
Error 500
I'll admit that my code could be much much cleaner as I'm still learning as I go but the real bottle neck appears to be reading the csv which is a report which size changes each day. I have tried different combinations and the best result is (please don't chastise me for this as I know it is stupid but the other ways haven't works as of yet) to run the code as follows:
$eventArray = eventArray($venueId);
$totalEvents = count($eventArray);
for($i=0; $i<$totalEvents; $i++)
{
$eventId = $eventArray[$i];
echo $eventId;
echo $datename = getEventDetails($eventId, $zone);
// date of event
echo $eventDate = $datename['eventDate'];
// vs team
echo $eventName = $datename['eventName'];
$file_handle = fopen("data/rohm/sales/today.csv", "r");
while (!feof($file_handle) )
{
$line_of_text = fgetcsv($file_handle, 200);
include('finance_logic.php');
}
fclose($file_handle);
}
Yes, it is repeating the reading of the csv every time but I couldn't get it to function at all any other way so if this is the issue I would really appreciate some guidence on dealing with the csv better. Incase it is relevent the code it 'finance_logic.php' is listed below:
if($line_of_text[0] == "Event: $eventName ")
{
$f = 1;
$ticketTotalSet = "no";
$killSet = 'no';
// default totals zero
$totalHolds = 0;
$totalKills = 0;
$ticketSold = 0;
$ticketPrice = 0;
$totalCap = 0;
}
if($f == 1 && $line_of_text[0] == "$eventDate")
{
$f = 2;
}
if($f == 2 && $line_of_text[0] == "Holds")
{
$f = 3;
while($line_of_text[$col] !== "Face Value Amt")
{
$col++;
}
}
if($f == 3 && $line_of_text[0] !== "Face Value Amt")
{
if($f == 3 && $line_of_text[0] == "*: Kill")
{
$totalKills = $line_of_text[$col];
}
$holdsArray[] = $line_of_text[$col];
}
if($f == 3 && $line_of_text[0] == "--")
{
$f = 4;
}
if($f == 4 && $line_of_text[0] == "Capacity")
{
$totalCap = $line_of_text[$col];
$f = 5;
}
if($f == 5 && $line_of_text[0] == "Abbreviated Performance Totals")
{
$f = 6;
}
if($f == 6 && $line_of_text[0] == "$eventName")
{
// change when 1 ticket exists
$ticketTotalSet = "yes";
// set season tickets
include("financial/seasontickets/$orgShortName.php");
// all non season are single tickets
if(!isset($category))
{
$category = 'single';
}
$ticketName = $line_of_text[2];
$ticketSold = $line_of_text[3];
$ticketPrice = $line_of_text[4];
addTicketType($eventId, $ticketName, $category, $ticketSold, $ticketPrice);
unset($category);
}
if($f == 6 && $ticketTotalSet == "yes" && $line_of_text[0] !== "$eventName")
{
$totalHolds = (array_sum($holdsArray) - $totalKills);
// add cap, holds and kills
addKillsHoldsCap($eventId, $totalCap, $eventId, $totalHolds, $totalKills);
// reset everything
$f = 0;
$ticketTotalSet = "no";
echo "$eventName updated!";
}
Thanks in advance!
p.s. The reason the report is called each time is so that the 'eventName' and 'eventDate' are searched for with the 'finance_logic.php'. Obviously if this was set with all event names and dates already it would take one search of the report to find them all but I'm not sure how I could do this dynamically. Any suggestions would be welcome as I'm sure there is something out there that I just haven't learnt yet.
I have some heavy script i use with localhost sometimes and if i don't add anything they will just time out.
A simple solution is to limit the number of execution of your function, then reload the page, then restart where you stopped.