Load big file into database - php

I have a big file that has about 11 Mb. It is a CSV file and I need to load the content of that file into a Postgres database.
I use a PHP script to do this job but always stop in some moment.
I put big size for PHP memory and other stuff and I could load more data but not all data.
How can I solve that? Is any cache memory that I need to clean? Some secret to manage big files in PHP?
Thanks in advance.
UPDATE: Add some code
$handler = fopen($fileName, "r");
$dbHandler = pg_connect($databaseConfig);
while (($line = $handler->fgetcsv(";")) !== false) {
// Algorithms to transform data
// Adding sql sentences in a variable
// I am using a "batch" idea that execute all sql formed after 5000 read lines
// When I reach 5000 read lines, execute my sql
$results = pg_query($dbHandler, $sql);
}

In case you have direct access to the server(and you don't work with some subversion software), postgre has a far better option that is far less demanding in terms of resources. Keep in mind that php is a slow and resource consuming language
COPY my_table_name FROM '/home/myfile.csv' DELIMITERS ',' CSV

Related

How can I prevent a race condition when using a file based counter in php?

I have a small web-page that delivers different content to a user based on a %3 (modulo 3) of a counter. The counter is read in from a file with php, at which point the counter is incremented and written back into the file over the old value.
I am trying to get an even distribution of the content in question which is why I have this in place.
I am worried that if two users access the page at a similar time then they could either both be served the same data or that one might fail to increment the counter since the other is currently writing to the file.
I am fairly new to web-dev so I am not sure how to approach this without mutex's. I considered having only one open and close and doing all of the operations inside of it but I am trying to minimize time where in which a user could fail to access the file. (hence why the read and write are in separate opens)
What would be the best way to implement a sort of mutual exclusion so that only one person will access the file at a time and create a queue for access if multiple overlapping requests for the file come in? The primary goal is to preserve the ratio of the content that is being shown to users which involves keeping the counter consistent.
The code is as follows :
<?php
session_start();
$counterName = "<path/to/file>";
$file = fopen($counterName, "r");
$currCount = fread($file, filesize($counterName));
fclose($file);
$newCount = $currCount + 1;
$file = fopen($counterName,'w');
if(fwrite($file, $newCount) === FALSE){
echo "Could not write to the file";
}
fclose($file);
?>
Just in case anyone finds themselves with the same issue, I was able to fix the problem by adding in
flock($fp, LOCK_EX | LOCK_NB) before writing into the file as per the documentation for php's flock function. I have read that it is not the most reliable, but for what I am doing it was enough.
Documentation here for convenience.
https://www.php.net/manual/en/function.flock.php

Codeigniter Allowed memory size exhausted while processing large files

I'm posting this in case someone else is looking for the same solution, seeing as I just wasted two days on this bullshit.
I have a cron job that updates the database using a very large file once a day, using the following code:
if (($handle = fopen(dirname(__FILE__) . '/uncompressed', "r")) !== FALSE)
{
while (($data = fgets($handle)) !== FALSE)
{
$thisline = json_decode($data, true);
$this->regen($thisline);
}
fclose($handle);
}
This is in a Codeigniter controller that's only used for cron jobs. The $this->regen function runs through a bunch of different checks and stores the right information from the line in the database. The file itself is over 300MB of JSONs separated by newlines.
The problem: it would only process about 20,000 lines before the whole thing ran out of memory.
I spent hours troubleshooting this and got nothing obvious. I'm using fgets, I have $query->free_result() in the right places. It didn't help. So then I started checking a loop of about 100 lines, and watched the output of memory_get_usage(). I finally narrowed it down to the Codeigniter Active Record class - every call to the class caused the memory usage to increase by a tiny amount.
Then I found this thread on Ellislabs and I got the answer. CI Active Record saves queries so that if you want to, you can build a query in multiple functions. (I am not even going to go into how dumb it is to have that switched on by default.)
Go to /config/database.php and add
$db['default']['save_queries'] = FALSE;
to the end of the file. Then make sure you build and execute queries using Active Record in a single function. If you need to switch it off just for one case, use
$this->db->save_queries = FALSE;
in the constructor or wherever you need to put it.

PHP server file download cutoff unexpectedly

I have a web interface that I built into the admin section of a WordPress site. It scrapes a few tables in my database and just displays a big list of data row by row. There are about 30,000 rows of this data, displayed with a basic echo in a for loop. Displaying all 30,000 rows on a page works fine.
Additionally, I include an option to download a CSV file of the complete rows of data. I use fopen and then fputcsv to build the CSV file for download from the result of the data query. This feature used to work, but now that the dataset is at 30,000, the CSV will no longer generate correctly. What happens is the first 200~1000 rows will be written to the CSV file leaving out the majority of the data. I estimate that the CSV that is not properly generated in my case would be about 10 Megs. Then the file will download the first 200~1000 rows as though everything was working correctly.
Here is the code:
// This gets a huge list of data from a SP I built. This data is well formed
$data = $this->run_stats_stored_procedure($job_to_report);
// This is where the data is converted into a csv file. This part is broken
// the file may already exist at that location burn it down if it does
if(file_exists(ABSPATH . "some/path/to/my/file/csv_export.csv")) {
unlink(ABSPATH . "some/path/to/my/file/csv_export.csv");
}
$csv_file_handler = fopen(ABSPATH . "some/path/to/my/file/candidate_export.csv", 'w');
if(!empty($csv_file_handler)) {
$title_array = array(
"ID",
"other_feild"
);
fputcsv($csv_file_handler, $title_array, ",");
if(!empty($data)) {
foreach($data as $data_piece) {
$array_as_csv_line = array();
foreach($data_piece as $object_property) {
$array_as_csv_line[] = (string)$object_property;
}
fputcsv($csv_file_handler, $array_as_csv_line, ",");
unset($array_as_csv_line);
}
} else {
fputcsv($csv_file_handler, array("empty"), ",");
}
// pros clean everything up when they are done
fclose($csv_file_handler);
}
I'm not sure what I need to change to get the entire CSV file to download. I believe this could be a configuration issue, but I'm not should. I am led to believe this because this function used to work with even 20,000 csv rows, it is now at 30,000 and breaking. Please let me know if additional info would help. Has anyone bumped into issues with huge CSV files before? Thank you to anyone who can help.
Is the "download" taking more than say a minute, two minutes, or three minutes? If so, the webserver could be closing the connection. For example, if you're using the Apache FCGI module, it has this directive:
FcgidBusyTimeout
which defaults to 300 seconds.
This is the maximum time limit for request handling. If a FastCGI request does not complete within FcgidBusyTimeout seconds, it will be subject to termination.
Hope this helps you solve your problem.
The answer that I am currently implementing is to allow the script to use more time. To do this, I am simply running the following code before the script runs:
set_time_limit ( 3600 );
I am doing further research because this is not a sustainable solution. Any further advice would be greatly appreciated.

PHP write to included file

I need to include one PHP file and execute function from it.
After execution, on end of PHP script I want to append something to it.
But I'm unable to open file. It's possible to close included file/anything similar so I'll be able to append info to PHP file.
include 'something.php';
echo $somethingFromIncludedFile;
//Few hundred lines later
$fh = fopen('something.php', 'a') or die('Unable to open file');
$log = "\n".'$usr[\''.$key.'\'] = \''.$val.'\';';
fwrite($fh, $log);
fclose($fh);
How to achieve that?
In general you never should modify your PHP code using PHP itself. It's a bad practice, first of all from security standpoint. I am sure you can achieve what you need in other way.
As Alex says, self-modifying code is very, VERY dangerous. And NOT seperating data from code is just dumb. On top of both these warnings, is the fact that PHP arrays are relatively slow and do not scale well (so you could file_put_contents('data.ser',serialize($usr)) / $usr=unserialize(file_get_contents('data.ser')) but it's only going to work for small numbers of users).
Then you've got the problem of using conventional files to store data in a multi-user context - this is possible but you need to build sophisticated locking queue management. This usually entails using a daemon to manage the queue / mutex and is invariably more effort than its worth.
Use a database to store data.
As you already know this attempt is not one of the good ones. If you REALLY want to include your file and then append something to it, then you can do it the following way.
Be aware that using eval(); is risky if you cannot be 100% sure if the content of the file does not contain harmful code.
// This part is a replacement for you include
$fileContent = file_get_contents("something.php");
eval($fileContent);
// your echo goes here
// billion lines of code ;)
// file append mechanics
$fp = fopen("something.php", "a") or die ("Unexpected file open error!");
fputs($fp, "\n".'$usr[\''.$key.'\'] = \''.$val.'\';');
fclose($fp);

PHP queue file implementation

For a project I was working on I need a queue which will be too large to hold in normal memory. I had been implementing it as a simple file where it would read the whole file take the first few (~100) lines, process them, then write back the updated queue with new instructions added and the old ones removed. However, since the queue became too large to hold in memory like this I need something different. Preferably someone can tell me a way to peel off just the first few lines of a file without having to look at the rest of the data. I had thought about using a database (MySQL probably with sorted insert timestamps) but I would heavily prefer to do it without for load and bandwidth reasons (several servers would have to all be sending and receiving a lot of data from the DB). The language I'm working in is PHP but really this question is more about unix files I suppose. Any help would be appreciated.
Sucking out the first line of a file is pretty trivial (fopen() followed by an fgets()). Re-writing the file to remove completed jobs would be very painful, especially if you've got multiple concurrent servers working off the same queue file.
One alternative would be to use a seperate file for each job. If you have some concurrency-safe method of generating an incrementing ID for these files, then it'd be a simple matter of picking out the file with the lowest id for the oldest job, and generating a new id for each new job. You'd have to figure out some file locking, though, to keep two+ servers grabbing the same file at the same time, however.
I had same problems while I was working on enqueue/fs transport. I failed to modify a small portion at the begging of the file without copying it to the memory and saving back. Instead, but that's possible to do that with the end of the file. You can read a portion and then truncate it. That's not really a queue but a stack. So if you rely on message ordering this would not be a solution. In my case, I lock the file when the file has been read from the file, the lock is released.
This is how you could write messages to a queue file:
<?php
$rawMessage = 'this your message to put to the queue as a string';
$queueFile = fopen('/path/to/queue/file', '+a');
// here it may add some spaces so the message length is multiples of modular.
// that make it easier to read messages from a file.
// lock file
$rawMessage = str_repeat(' ', 64 - (strlen($rawMessage) % 64)).$rawMessage;
fwrite($queueFile, $rawMessage);
// release lock
This is how you could read messages from a queue file:
<?php
$queueFile = fopen('/path/to/queue/file', '+c');
// lock file
$frame = readFrame($file, 1);
ftruncate($file, fstat($file)['size'] - strlen($frame));
rewind($file);
$rawMessage = substr(trim($frame), 1);
// release lock
function readFrame($file, $frameNumber)
{
$frameSize = 64;
$offset = $frameNumber * $frameSize;
fseek($file, -$offset, SEEK_END);
$frame = fread($file, $frameSize);
if ('' == $frame) {
return '';
}
if (false !== strpos($frame, '|{')) {
return $frame;
}
return readFrame($file, $frameNumber + 1).$frame;
}
For the locking I'd suggest using Symfony LockHandler or simply take enqueue/fs.

Categories