I am having some trouble working with MongoDB in PHP at the moment.
I am pulling records of financial data from a CSV file almost a gig, I am looping through the file fine and outputting and parsing the array.
During the while loop I am also trying to insert the data in to MongoDB
// Increase timeout on php script
ini_set('max_execution_time', 600);
while (($data = fgetcsv($file, 0, ",")) !==FALSE) {
$parsedData['name'] = $data['0'];
$parsedData['email'] = $data['1'];
$parsedData['phone'] = $data['2'];
$parsedData['address'] = $data['3'];
$parsedData['gender'] = $data['4'];
$collection->insert($parsedData);
}
So the problem is that it inserts only one of the records or a few, I can't really say it seems quite random.
Any help here would be great.
Tests Completed
Running the same function while testing with mysql returned successful.
print_r($parsedData) displays desired values.
Wrapping $collection->insert in an if statement returns true
Okay so I managed to resolve this issue after reading more on some MongoDB documentation.
I wrapped the procedure with a try and catch adding an exception
Added fsync and safe to the array that was sent to MongoDB
The final piece added was "new MongoId" as MongoDB was returning duplicate _id (as far as I know this was the only necessary step to take)
while (($data = fgetcsv($file, 0, ",")) !==FALSE) {
try{
// Add MongoId, without this it was returning a duplicate key
// error in the catch.
$parsedData['_id'] = new MongoId();
$parsedData['name'] = $data['0'];
$parsedData['email'] = $data['1'];
$parsedData['phone'] = $data['2'];
$parsedData['address'] = $data['3'];
$parsedData['gender'] = $data['4'];
// Submitted "safe" and "fsync" with the array, as far as I
// can see MongoDB waits till data is entered before it sends
// a true response instead of continuing after the function is
// executed.
$collection->save($parsedData, array('safe' => true, 'fsync' => true));
}catch(MongoCursorException $e){
// This is where I caught the duplicate id
print_r($e->doc['err']);
// Kill the procedure
die();
}
}
If anyone can add to this it would be great as I thought Mongo generated its own id's and that it would only return true when data is entered or maybe I'm just expecting it to run similar to the MySQL drivers.
Related
I am using league/csv to parse a csv file and then later dumping those data to the database.
The structure looks like:
$csv = Reader::createFromPath($csv_file_path, 'r');
$csv->setOutputBOM(Reader::BOM_UTF8);
$csv->addStreamFilter('convert.iconv.ISO-8859-15/UTF-8');
$csv->setHeaderOffset(0);
$csv_header = $csv->getHeader();
$loop = true;
while($loop){
$stmt = (new Statement())
->offset($offset)
->limit($limit)
;
$records = $stmt->process($csv);
foreach ($records as $record)
{
$rec_arr[] = array_values($record);
}
$records_arr = $service->trimArray($rec_arr);
if(count($records_arr)>0)
{
foreach($records_arr as $ck => $cv){
//map data and insert into database
}
}else{
$loop = false;
}
}
Currently, I am implementing this logic inside a laravel queue. It is successfully inserting the whole set of data but it is not halting the process.
It keeps getting stuck with message processing. However, if I removed that while loop then it will be stopped with message processed.
So, I think it should be something that I am implementing some bad logic there.
Looking for an idea to tackle with this.
if(count($records_arr)>0)
This line probably evaluates to true always.
Your code never reaches the $loop = false; end condition.
#stuart thanks for your comment. It was because I had working loop previously which used to work with multiple ajax requests. However, now with queue too, I had placed records, rec_arr outside of loop. Here, I placed this array initialization inside while loop and it works perfectly fine.
I am developing an application which has to read large CSV file and process data. It will be definitely not possible to make it in one request because processing the data also takes time, it is not just about reading.
So what I tried so far and what has been working well so far is the following:
// Open file
$handle = fopen($file, 'r');
// Move pointer to a place where it stopped last time
fseek($handle, $offset);
// Read limited line and process
for ($i = 0; $i < $limit; $i++) {
// Get length of line for offset purposes
$newlength = strlen(fgets($handle));
// Move pointer back. fgets moves pointer so we move it back for fgetcsv to get that line again
fseek($handle, $offset);
$line = fgetcsv($handle, 0, $csv_delimiter);
// Process data here
// Save offset
$offset += $newlength;
}
So the problem is here on this line:
$newlength = strlen(fgets($handle));
It fails when csv column has line breaks.
I also tried $newlength = strlen(implode(';', fgetcsv($handle, 0, $csv_delimiter))); but this does not always work. It usually fails for few characters. Probably quotations and end of line is not handled properly here.
All I need is to get length of csv line, not just single line, but csv line which might have line breaks within quotes.
Anybody has better solution?
do one thing, create one mysql temporary table named "my_csv_data", and add one field in that table with all fields which are in csv file and extra add one "is_processed" with enum(0,1) default value '0'.
now import your all csv data in that sql table. it will never take more time for single insert.
now cerate one function/file which access my_csv_data table 10 or 100 records where is_processed='0' and process it and if process done successfully then update "is_processed" field to '1'.
now create one cronjob which hit that file/function. periodically.
using this way data will going to silently insert in your table without disturb/suffer any admin/front end user.
i have codeignitor code where i uploading the csv file data and insert it into mysql database. hope this will help you
if($_FILES["file"]["size"] > 0)
{
$file = fopen($filename, "r");
while (($emapData = fgetcsv($file, 10000, ",")) !== FALSE)
{
$data = array(
'reedumption_code' => $emapData[0],
'jb_note_id' =>$jbmoney_id,
'jbmoney' =>$jbamount,
'add_date'=>time(),
'modify_date'=>time(),
'user_id'=>0,
'status'=>1,
'assign_date'=>0,
'del_status'=>1,
'store_status'=>1
);
$this->load->model('currency_model');
$insertId = $this->currency_model->insertCSV($data);
}
fclose($file);
redirect('currency/add_currency?msg=Data Imported Successfully');
}
I'm developing an app where user upload excel [.xlsx] file for dumping data into MySQL database. I have programmed in such a way that there is a LOG created for each import. So that user can see if there is any error occurred and etc.. My script was working perfectly before implementing the log system.
After implementing the log system i can see duplicate rows inserted into database. Also die() command is not working.
It just keep looping continuously!
I have written sample code below. Please tell whats wrong in my logging method.
Note: if i remove logging [Writing into file] script works correctly.
$file = fopen("20131105.txt", "a");
fwrite($file, "LOG CREATED".PHP_EOL);
foreach($hdr as $k => $v) {
$username = $v['un'];
$address = $v['adr'];
$message = $v['msg'];
if($username == '') {
fwrite($file, 'Error: Missing User Name'.PHP_EOL);
continue;
} else {
// insert into database
}
}
fwrite($file, PHP_EOL."LOG CLOSED");
fclose($file);
echo 1;
die();
First, your die statement is after your loop. It needs to be inside your loop to end it;
Second, you're looping over $hdr. It's not defined in your snippet tho. It has to be an array. What does it contain?
var_dump($hdr);
The documentation for foreach as given in php manual highlights
"Reference of a $value and the last array element remain even after the foreach loop. It is recommended to destroy it by unset()."[1].
Try unsetting the values in foreach using unset($value) . This might be the reason for duplicate values.
I wand to read biiiiig CSV-Files and want to insert them into a database. That already works:
if(($handleF = fopen($path."\\".$file, 'r')) !== false){
$i = 1;
// loop through the file line-by-line
while(($dataRow = fgetcsv($handleF,0,";")) !== false) {
// Only start at the startRow, otherwise skip the row.
if($i >= $startRow){
// Check if to use headers
if($lookAtHeaders == 1 && $i == $startRow){
$this->createUberschriften( array_map(array($this, "convert"), $dataRow ) );
} else {
$dataRow = array_map(array($this, "convert"), $dataRow );
$data = $this->changeMapping($dataRow, $startCol);
$this->executeInsert($data, $tableFields);
}
unset($dataRow);
}
$i++;
}
fclose($handleF);
}
My problem of this solution is, that it's very slow. But the files are too big to put it directly into the memory... So I wand to ask, if there a posibility to read, for example 10 lines, into the $dataRow array not only one or all.
I want to get a better balance between the memory and the performance.
Do you understand what i mean? Thanks for help.
Greetz
V
EDIT:
Ok, I still have to try to find a solution with the MSSQL-Database. My solution was to stack the data and than make a multiple-MSSQL-Insert:
while(($dataRow = fgetcsv($handleF,0,";")) !== false) {
// Only start at the startRow, otherwise skip the row.
if($i >= $startRow){
// Check if to use headers
if($lookAtHeaders == 1 && $i == $startRow){
$this->createUberschriften( array_map(array($this, "convert"), $dataRow ) );
} else {
$dataRow = array_map(array($this, "convert"), $dataRow );
$data = $this->changeMapping($dataRow, $startCol);
$this->setCurrentRow($i);
if(count($dataStack) > 210){
array_push($dataStack, $data);
#echo '<pre>', print_r($dataStack), '</pre>';
$this->executeInsert($dataStack, $tableFields, true);
// reset the stack
unset($dataStack);
$dataStack = array();
} else {
array_push($dataStack, $data);
}
unset($data);
}
$i++;
unset($dataRow);
}
}
Finaly I have to loop the Stack and build in mulitiple Insert in the method "executeInsert", to create a query like this:
INSERT INTO [myTable] (field1, field2) VALUES ('data1', 'data2'),('data2', 'datta3')...
That works much better. I still have to check the best balance, but therefor i can change only the value '210' in the code above. I hope that help's everybody with a similar problem.
Attention: Don't forget to execute the method "executeInsert" again after readin the complete file, because it could happen that there are still some data in the stack and the method will only be executed when the stack reach the size of 210....
Greetz
V
I think your bottleneck is not reading the file. Which is a text file. Your bottleneck is the INSERT in the SQL table.
Do something, just comment the line that actually do the insert and you will see the difference.
I had this same issue in the past, where i did exactly what you are doing. reading a 5+ million lines CSV and inserting it in a Mysql table. The execution time was 60 hours which is
unrealistic.
My solutions was switch to another db technology. I selected MongoDB and the execution time
was reduced to 5 minutes. MongoDB performs really fast on this scenarios and also have a tool called mongoimport that will allow you to import a csv file firectly from the command line.
Give it a try if the db technology is not a limitation on your side.
Another solution will be spliting the huge CSV file into chunks and then run the same php script multiple times in parallel and each one will take care of the chunks with an specific preffix or suffix on the filename.
I don't know which specific OS are you using, but in Unix/Linux there is a command line tool
called split that will do that for you and will also add any prefix or suffix you want to the filename of the chunks.
My website allows users to upload a csv file with a list of books. The script then reads this file and checks the isbn number against Amazon, using the PEAR Services_Amazon class, returning enhanced book data. However, whenever I run the script on a list of books the amount of memory consumed steadily increases until I get a fatal error. At the moment, with 32 MB allocated, I can only read 370 records of the CSV file before it crashes.
I have a user with a 4500 record file to import and a virtual server with 256 MB of RAM, so increasing the memory limit is not a solution.
Here is a simplified version of the CSV import:
$handle = fopen($filename, "r");
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
$isbn = $data[6];
checkIsbn($isbn);
}
Here is a trimmed version of the function:
function checkIsbn($isbn) {
$amazon = &new Services_Amazon(ACCESS_KEY_ID, SECRET_KEY, ASSOC_ID);
// -- $options array filled with $isbn, other requested info --
$products = $amazon->ItemSearch('Books', $options);
// -- Then I create an array from the first result --
$product = $products['Item'][0];
$title = $product['ItemAttributes']['Title'];
// -- etc... various attributes are pulled from the $product array --
mysql_query($sql); // -- put attributes into our DB
unset($product);
unset($products);
usleep(1800000); // maximum of 2000 calls to Amazon per hour as per their API
return $book_id;
}
What I've tried: unsetting the arrays as well as setting them to NULL, both in the function and in the CSV import code. I have increased all my timeouts to ensure that's not an issue. I installed xdebug and ran some tests, but all I found was that the script just kept increasing in memory each time the Amazon class is accessed (I'm no xdebug expert). I'm thinking that maybe the variables in the Services_Amazon class are not being cleared each time it's run, but have no idea where to go from here. I'd hoped unsetting the two arrays would do it, but no luck.
Edit: Update: I've decided that this may be a problem in the PEAR class (and looking at some of the questions here relating to PEAR, this does seem possible). Anyway, my OOP skills are very few at the moment, so I found a way to do this by reloading the page multiple times - see my answer below for details.
first of all, this is not a memory leak but bad programming...
second point is that unset won't free the used memory, it just removes the reference to the variable from the current scope.
also better try to not copy the memory here but just make $produkt and $title a pointer by assigning only the references to $products;
$product = &$products['Item'][0];
$title = &$product['ItemAttributes']['Title'];
then, instead of only unset() do
$products = NULL;
unset($products);
this will free the memory, not immediately but when the php garbage collector runs the next time...
also why do you create a new instance iof the Serverces_Amazon each time the function i called? what about a class member to create instance in when constructing your object.
class myService
{
protected $_service;
public function __construct()
{
$this->_service = new Services_Amazon(ACCESS_KEY_ID, SECRET_KEY, ASSOC_ID);
}
public function checkIsbn($isbn)
{
//...
$this->_service->ItemSearch('Books', $options);
//...
}
}
$myService = new myService;
$handle = fopen($filename, "r");
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
$bookId = $myService->checkIsbn($data[6]);
}
and furthermore you assume that your users all use the same CSV format which is very unlikely... so better use a real CSV parser which can handle all possible CSV notations...
How about only creating a single instance of the $amazon object and passing it in to your checkIsbn function? They you wouldn't need to create 4500 instances.
$amazon = &new Services_Amazon(ACCESS_KEY_ID, SECRET_KEY, ASSOC_ID);
$handle = fopen($filename, "r");
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
$isbn = $data[6];
checkIsbn($amazon, $isbn);
}
unset($amazon);
I think you should also look into how you are connecting to the database - are you creating fresh connections each time checkIsbn is called? That could also be part of the problem.