Export Large data in Yii Stuck server to work - php

I am working in Yii and want to export Large data approx 2 Lack records at a time. Problem is When I try to export data server is stop working and hang all process in system. I have to kill all service and restart server again,m Can anyone tell me appropriate way to export data in csv file.
$count = Yii::app()->db->createCommand('SELECT COUNT(*) FROM TEST_DATA')->queryScalar();
$maxRows = 1000:
$maxPages = ceil($count / $maxRows);
for ($i=0;$i<$maxPages;$i++)
{
$offset = $i * $maxRows;
$rows = $connection->createCommand("SELECT * FROM TEST_DATA LIMIT $offset,$maxRows")->query();
foreach ($rows as $row)
{
// Here your code
}
}

May be it is because of the processing the code without closing the session. When you start the process and do not close session, in the period of processing code, you can not load any page of the site (in the same browser) because of session (it will be busy). It could be accepted as "hanging of the server" but server is running as it should. You can check it by loading the site on different browser, if it loads, it means the process is running as it should be.
In my experience, i used some table to save processing data (successfully processed offset, last_iterated_time) and see the current state of the processing. Fore example table "processing_data" with variables 'id' (int), 'stop_request'(tinyint, for stopping process, if 1 stop the iteration), 'offset'(int), 'last_iterated_time'(datetime). Add only one record on this table, and on every iteration check the 'stop_request' variable, if it gets the value 1 you can break iteration. And on every iteration you can save current offset value a current datetime. By doing this you can stop processing and continue.
And you can use while (to reduce memory usage) to iterate without counting:
set_time_limit(0);
$offset=0;
$nextRow= $connection->createCommand("SELECT * FROM TEST_DATA LIMIT $offset, 1")->queryRow();
while($nextRow) {
//Here your code
$processingData= ProcessingData::model()->findByPk(1);
$processingData->offset=$offset;
$processingData->last_iterated_time=new CDbExpression('NOW()');
$processingData->save();
if($processingData->stop_request==1) { break; }
$offset++;
$nextRow= $connection->createCommand("SELECT * FROM TEST_DATA LIMIT $offset, 1")->queryRow();
}

Related

Which Method is More Practical or Orthodox When Importing Large Data?

I have a file that has the function of importing data into a sql database from an api. A problem I encountered was that the api can only retrieve a max dataset size of 1000, even though sometimes I need to retrieve large amounts of data, ranging from 10-200,000. My first thought was to create a while loop in which inside I make calls to the api until all of the data is properly retrieved, and afterwards, can I enter it into the database.
$moreDataToImport = true;
$lastId = null;
$query = '';
while ($moreDataToImport) {
$result = json_decode(callToApi($lastId));
$query .= formatResult($result);
$moreDataToImport = !empty($result['dataNotExported']);
$lastId = getLastId($result['customers']);
}
mysqli_multi_query($con, $query);
The issue I encountered with this is that I was quickly reaching memory limits. The easy solution to this is to simply increase the memory limit until it was suffice. How much memory I needed, however, was undeclared, because there is always a possibility that I need to import very large datasets, and can theoretically always run out of memory. I don't want to set an infinite memory limit, as the problems with that are unimaginable.
My second solution to this was instead of looping through the imported data, I could instead send it to my database, and then do a page refresh, with a get request specifying the last Id I left off on.
if (isset($_GET['lastId'])
$lastId = $_GET['lastId'];
else
$lastId = null;
$result = json_decode(callToApi($lastId));
$query .= formatResult($result);
mysqli_multi_query($con, $query);
if (!empty($result['dataNotExported'])) {
header('Location: ./page.php?lastId='.getLastId($result['customers']));
}
This solution solves my memory limit issue, however now I have another issue, being that browsers, after 20 redirects (depends on the browser), will automatically kill the program to stop a potential redirect loop, then shortly refresh the page. The solution to this would be to kill the program yourself at the 20th redirect and allow it to do a page refresh, continuing the process.
if (isset($_GET['redirects'])) {
$redirects = $_GET['redirects'];
if ($redirects == '20') {
if ($lastId == null) {
header("Location: ./page.php?redirects=2");
}
else {
header("Location: ./page.php?lastId=$lastId&redirects=2");
}
exit;
}
}
else
$redirects = '1';
Though this solves my issues, I am afraid this is more impractical than other solutions, as there must be a better way to do this. Is this, or the issue of possibly running out of memory my only two choices? And if so, is one more efficient/orthodox than the other?
Do the insert query inside the loop that fetches each page from the API, rather than concatenating all the queries.
$moreDataToImport = true;
$lastId = null;
$query = '';
while ($moreDataToImport) {
$result = json_decode(callToApi($lastId));
$query = formatResult($result);
mysqli_query($con, $query);
$moreDataToImport = !empty($result['dataNotExported']);
$lastId = getLastId($result['customers']);
}
Page your work. Break it up into smaller chunks that will be below your memory limit.
If the API only returns 1000 at a time, then only process 1000 at a time in a loop. In each iteration of the loop you'll query the API, process the data, and store it. Then, on the next iteration, you'll be using the same variables so your memory won't skyrocket.
A couple things to consider:
If this becomes a long running script, you may hit the default script running time limit - so you'll have to extend that with set_time_limit().
Some browsers will consider scripts that run too long to be timed out and will show the appropriate error message.
For processing upwards of 200,000 pieces of data from an API, I think the best solution is to not make this work dependant on a page load. If possible, I'd put this in a cron job to be run by the server on a regular schedule.
If the dataset is dependant on the request (for example, if you're processing temperatures from one of 1000s of weather stations - the specific station ID to be set by the user), then consider creating a secondary script that does the work. Calling and forking the secondary script from your primary script will enable your primary script to finish execution while your secondary script executes in the background on your server. Something like:
exec('php path/to/secondary-script.php > /dev/null &');

Does PHP sleep() function allow to free the main thread for a specified amount of seconds?

I have a web application that allows the users to upload DBF files and the app will store contents into an SQL database. The row count range from a few thousands to about 80,000 rows and I have the following code
if($file){
$totalRows = dbase_numrecords($file);
for($i = 1; $i <= $totalRows; $i++){
$row = dbase_get_record_with_names($file, $i);
//echo $row["BILL_NO"]." ";
if(!empty(trim($row["STATUS"]))){ //save to database if column is not empty
$data = [
//array data from the row
];
$db->table("item_menu")->replace($data);
}
if($i%1000 == 0) //Sleep call here every 1000 rows
sleep(1);
}
echo "done";
}
This function, once done will be called once per day and ideally just called/run in the background. However, when I do not place the sleep function, the server doesn't serve any pages until the loop completes, which can take from a few seconds to about a minute of unresponsiveness, but when the sleep function is added, the server continuously serve pages to different users.
My question is, does the sleep function help free up the current thread and process other requests during the sleep period?
If you use sleep() function then you will end up executing all the thing in a single thread causing a pause on the whole process. You should go for php v8.1 for that kind of process handling.

MySQL server has gone away - Zend

I have very strange problem. I'm working on shop based on zend framework. I'm creating integration with some auction service (allegro.pl). I need to download via soap all items. After my function finish job I get "MySQL server has gone away".
Here is my code:
private function getProducts()
{
$items = [];
$filterOptions = /* doesn't matter for this question */;
$allegroItems = $this->allegro->getCore()->doGetItemsList(0, 1, $filterOptions, 3, null);
$itemsCount = $allegroItems->itemsCount;
$perPage = 1000;
$maxPage = ceil(round($itemsCount, 0) / $perPage);
for ($i = 0; $i < $maxPage; $i++) {
$allegroItems = $this->allegro->getCore()->doGetItemsList($perPage * $i, $perPage, $filterOptions, 3, null)->itemsList->item;
if (!is_array($allegroItems)) {
$allegroItems = [$allegroItems];
}
foreach ($allegroItems as $item) {
$items[(string)$item->itemId] = $item;
}
}
return $items;
}
There are ~3000 items currently. I get error when I download over 2500-3000 items (didn't calculate exact number). It doesn't matter if I set $perPage to 1, 100 or 1000. It doesn't depend on execution time - I can set sleep(100) and download 1000 products with no error. Just before last line of this function I can call any DB query with no problems but then, when framework's built in function tries to update tasks table I get error.
Error seems to depend on nothing... Not execution time (time is ~30 sec and it's working fine with sleep(100)), not memory limit (I can unset variables each loop or set memory limit to 1GB, didn't help), not execution time of soap functions (downloading 2000 items one by one works fine although it takes few minutes). And the most strange for me - db queries are working inside that function just before the last line like I said.
I'm not using clear zend framework but "shoper" which is based on zend.
Any ideas?
The problem was with BLOB data. Increasing max_allowed_packet solved it. However I have no idea what was in that BLOB and how my function could affect it because I wrote it as completely independent function :-)

Running multiple PHP scripts at the same time (database loop issue)

I am running 10 PHP scripts at the same time and it processing at the background on Linux.
For Example:
while ($i <=10) {
exec("/usr/bin/php-cli run-process.php > /dev/null 2>&1 & echo $!");
sleep(10);
$i++;
}
In the run-process.php, I am having problem with database loop. One of the process might already updated the status field to 1, it seem other php script processes is not seeing it. For Example:
$SQL = "SELECT * FROM data WHERE status = 0";
$query = $db->prepare($SQL);
$query->execute();
while ($row = $query->fetch(PDO::FETCH_ASSOC)) {
$SQL2 = "SELECT status from data WHERE number = " . $row['number'];
$qCheckAgain = $db->prepare($SQL2);
$qCheckAgain->execute();
$tempRow = $qCheckAgain->fetch(PDO::FETCH_ASSOC);
//already updated from other processs?
if ($tempRow['status'] == 1) {
continue;
}
doCheck($row)
sleep(2)
}
How do I ensure processes is not re-doing same data again?
When you have multiple processes, you need to have each process take "ownership" of a certain set of records. Usually you do this by doing an update with a limit clause, then selecting the records that were just "owned" by the script.
For example, have a field that specifies if the record is available for processing (i.e. a value of 0 means it is available). Then your update would set the value of the field to the scripts process ID, or some other unique number to the process. Then you select on the process ID. When your done processing, you can set it to a "finished" number, like 1. Update, Select, Update, repeat.
The reason why your script executeds the same query multiple times is because of the parallelisation you are creating. Process 1 reads from the database, Process 2 reads from the database and both start to process their data.
Databases provide transactions in order to get rid of such race conditions. Have a look at what PDO provides for handling database transactions.
i am not entirely sure of how/what you are processing.
You can introduce limit clause and pass that as a parameter. So first process does first 10, the second does the next 10 and so on.
you need lock such as "SELECT ... FOR UPDATE".
innodb support row level lock.
see http://dev.mysql.com/doc/refman/5.0/en/innodb-locking-reads.html for details.

MySql Query lag time / deadlock?

When there are multiple PHP scripts running in parallel, each making an UPDATE query to the same record in the same table repeatedly, is it possible for there to be a 'lag time' before the table is updated with each query?
I have basically 5-6 instances of a PHP script running in parallel, having been launched via cron. Each script gets all the records in the items table, and then loops through them and processes them.
However, to avoid processing the same item more than once, I store the id of the last item being processed in a separate table. So this is how my code works:
function getCurrentItem()
{
$sql = "SELECT currentItemId from settings";
$result = $this->db->query($sql);
return $result->get('currentItemId');
}
function setCurrentItem($id)
{
$sql = "UPDATE settings SET currentItemId='$id'";
$this->db->query($sql);
}
$currentItem = $this->getCurrentItem();
$sql = "SELECT * FROM items WHERE status='pending' AND id > $currentItem'";
$result = $this->db->query($sql);
$items = $result->getAll();
foreach ($items as $i)
{
//Check if $i has been processed by a different instance of the script, and if so,
//leave it untouched.
if ($this->getCurrentItem() > $i->id)
continue;
$this->setCurrentItem($i->id);
// Process the item here
}
But despite of all the precautions, most items are being processed more than once. Which makes me think that there is some lag time between the update queries being run by the PHP script, and when the database actually updates the record.
Is it true? And if so, what other mechanism should I use to ensure that the PHP scripts always get only the latest currentItemId even when there are multiple scripts running in parallel? Would using a text file instead of the db help?
If this is run in parallell there's little measure to avoid race conditions.
script1:
getCurrentItem() yields Id 1234
...context switch to script2, before script 1 gets to run its update statement.
script2:
getCurrentItem() yields Id 1234
And both scripts process Id 1234
You'd want to update and check status of the item an all-or-nothing operation, you don't need the settings table, but you'd do something like this (pseudo code):
SELECT * FROM items WHERE status='pending' AND id > $currentItem
foreach($items as $i) {
rows = update items set status='processing' where id = $i->id and status='pending';
if(rows == 0) //someone beat us to it and is already processing the item
continue;
process item..
update items set status='done' where id = $i->id;
}
What you need is for any thread to be able to:
find a pending item
record that that item is now being worked on (in the settings table)
And it needs to do both of those in one go, without any other thread interfering half-way through.
I recommend putting the whole SQL in a stored procedure; that will be able to run the entire thing as a single transaction, which makes it safe from competing threads.

Categories