I have an issue with my code
This script writes the variable to a csv file.
I m getting the parameter trough via HTTP GET, the problem is each records comes one by one very slowly.
It should be able to take a batch of thousands of records.
I also noticed it's incomplete because it's missing about half the record when comparing to the full report downloaded from my vendor.
Here is the script:
<?php
error_reporting(E_ALL ^ E_NOTICE);
// setting the default timezone to use.
date_default_timezone_set('America/New_York');
//setting a the CSV File
$fileDate = date("m_d_Y");
$filename = "./csv_archive/" . $fileDate . "_SmsReport.csv";
//Creating handle
$handle = fopen($filename, "a");
//$handle = fopen($directory.$filename, 'a')
//These are the main data field
$item1 = $_REQUEST['item1'];
$item2 = $_REQUEST['item2'];
$item3 = $_REQUEST['item3'];
$mydate = date("Y-m-d H:i:s");
$csvRow = $item2 . "," . $item1 . "," . $item3 . "," . $mydate . "\n";
//writing to csv file
// just making sure the function could wrtite to it
if (!$handle = fopen($filename, 'a')) {
echo "Cannot open file ($filename)";
exit;
}
//writing the data
if (fwrite($handle, $csvRow) === FALSE) {
echo "Cannot write to file ($filename)";
exit;
}
fclose($handle);
?>
I rewrote it twice but the issue still persist. This goes beyond the scope of my knowledge so I am hoping someone tell me a better approach.
My boss blaming PHP, help me prove him wrong!
I think there's a better way of doing this. Try putting all your data into an array first, with each row in the CSV file being an array in itself, and then outputting it. Here's an example of some code I wrote a while back:
class CSV_Output {
public $data = array();
public $deliminator;
function __construct($data, $deliminator=",") {
if (!is_array($data)) {
throw new Exception('CSV_Output only accepts data as arrays');
}
$this->data = $data;
$this->deliminator = $deliminator;
}
public function output() {
foreach ($this->data as $row) {
$quoted_data = array_map(array('CSV_Output', 'add_quotes'), $row);
echo sprintf("%s\n", implode($this->deliminator, $quoted_data));
}
}
public function headers($name) {
header('Content-Type: application/csv');
header("Content-disposition: attachment; filename={$name}.csv");
}
private function add_quotes($data) {
$data = preg_replace('/"(.+)"/', '""$1""', $data);
return sprintf('"%s"', $data);
}
}
// CONSTRUCT OUTPUT ARRAY
$CSV_Data = array(array(
"Item 1",
"Item 2",
"ITem 3",
"Date"
));
// Needs to loop through all your data..
for($i = 1; $i < (ARGUMENT_TO_STOP_LOOP) ; $i++) {
$CSV_Data[] = array($_REQUEST['item1'], $_REQUEST['item2'], $_REQUEST['item3'], $_REQUEST['itemdate']);
}
$b = new CSV_Output($CSV_Data);
$b->output();
$b->headers("NAME_YOUR_FILE_HERE");
As requests come in to your server from BulkSMS, each request is trying to open and write to the same file.
These requests are not queued, and do not wait for the previous one to finish before starting another, meaning many will fail as the server finds the file is already in use by the previous request.
For this application, you'd be much better off storing the data from each request in a database such as SQLite and writing a separate script to generate the CSV file on demand.
I'm not particularly familiar with SQLite, but I understand it's fairly easy to implement and seems to be well documented.
Because multiple requests will arrive at the same time, concurrent requests will try to access the same output-file and blocking the other request access.
As I pointed out in my comment, you should be using a decent database. PostgreSQL or MySQL are open-source databases and have good support for PHP.
In my experience, PostgreSQL is a more solid database and performs better with many simultaneous users (especially when 'writing' to the database), although harder to learn (its more 'strict').
MySQL is easier to learn and may be sufficient, depending on the total number of request/traffic.
PostgreSQL:
http://www.postgresql.org
MySQL:
http://www.mysql.com
Do not use SQLite as a database for this because SQLite is a file-based database designed as a single-user database, not for client/server purposes. Trying to use it for multiple requests at the same time will give you the same kind of problems you're currently having
http://www.sqlite.org/whentouse.html
How Scalable is SQLite?
Related
In a scheduled task of my Laravel application I'm reading several large gzipped CSV files, ranging from 80mb to 4gb on an external FTP server, containing products which I store in my database based on a product attribute.
I loop through a list of product feeds that I want to import but each time a fatal error is returned: 'Allowed memory size of 536870912 bytes exhausted'. I can bump up the length parameter of the fgetcsv function from 1000 to 100000 which solves the problem for the smaller files (< 500mb) but for the larger files it will return the fatal error.
Is there a solution that allows me to either download or unzip the .csv.gz files, reading the lines (by batch or one by one) and inserting the products into my database without running out of memory?
$feeds = [
"feed_baby-mother-child.csv.gz",
"feed_computer-games.csv.gz",
"feed_general-books.csv.gz",
"feed_toys.csv.gz",
];
foreach ($feeds as $feed) {
$importedProducts = array();
$importedFeedProducts = 0;
$csvfile = 'compress.zlib://ftp://' . config('app.ftp_username') . ':' . config('app.ftp_password') . '#' . config('app.ftp_host') . '/' . $feed;
if (($handle = fopen($csvfile, "r")) !== FALSE) {
$row = 1;
$header = fgetcsv($handle, 1, "|");
while (($data = fgetcsv($handle, 1000, "|")) !== FALSE) {
if($row == 1 || array(null) !== $data){ $row++; continue; }
$product = array_combine($header, $data);
$importedProducts[] = $product;
}
fclose($handle);
} else {
echo 'Failed to open: ' . $feed . PHP_EOL;
continue;
}
// start inserting products into the database below here
}
The problem is probably not the gzip file itself,
Of course you can download it, on process it then, this will keep the same issues.
Because you are loading all products in a single array (Memory)
$importedProducts[] = $product;
You could comment this line out, and see it if this prevent's hitting your memory limit.
Usually i would create a method like this addProduct($product) to handle it memory safe.
You can then from there decide a max number of products before doing a bulk insert. to achieve optimal speed.. i usually use something between 1000 en 5000 rows.
For example
class ProductBatchInserter
{
private $maxRecords = 1000;
private $records = [];
function addProduct($record) {
$this->records[] = $record;
if (count($this->records) >= $this->maxRecords) {
EloquentModel::insert($this->records);
$this->records = [];
}
}
}
However i usualy don't implement it as a single class, but in my projects i used to integrate them as a BulkInsertable trait that could be used on any eloquent model.
But this should give you an direction, how you can avoid memory limits.
Or, the easier , but significantly slower, just insert the row where you now assign it to array.
But that will put a ridiculous load on your database and will be really very slow.
If the GZIP stream is the bottleneck
As i expect this is not the issue, but if it would, then you could use gzopen()
https://www.php.net/manual/en/function.gzopen.php
and nest the gzopen handle as handle for fgetcsv.
But i expect the streamhandler you are using, is doing this already the same way for you..
If not, i mean like this:
$input = gzopen('input.csv.gz', 'r');
while (($row = fgetcsv($input)) !== false) {
// do something memory safe, like suggested above
}
If you need to download it anyway there are many ways to do it, but make sure you use something memory safe, like fopen / fgets , or a guzzle stream and don't try to use something like file_get_contents() that loads it into memory
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have a csv file, containing millions of email addresses which I want to upload fast into a mysql database with PHP.
Right now I'm using a single threaded program which takes too much time to upload.
//get the csv file
$file = $_FILES['csv']['tmp_name'];
$handle = fopen($file,"r");
//loop through the csv file and insert into database
do {
if ($data[0]) {
$expression = "/^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$/";
if (preg_match($expression, $data[0])) {
$query=mysql_query("SELECT * FROM `postfix`.`recipient_access` where recipient='".$data[0]."'");
mysql_query("SET NAMES utf8");
$fetch=mysql_fetch_array($query);
if($fetch['recipient']!=$data[0]){
$query=mysql_query("INSERT INTO `postfix`.`recipient_access`(`recipient`, `note`) VALUES('".addslashes($data[0])."','".$_POST['note']."')");
}
}
}
} while ($data = fgetcsv($handle,1000,",","'"));
First of all, I can't stress enough; fix your indentation - it will make life easier for everyone.
Secondly, the answer depends a lot on the actual bottlenecks you are encountering:
Regular expressions are very slow, especially when they're in a loop.
Databases tend to either work well for WRITES or for READS but not BOTH: try decreasing the amount of queries beforehand.
It stands to reason that the less PHP code in your loop, the faster it will work. Consider decreasing conditions (for instance).
For the record, your code is not safe against mysql injection: filter $_POST before hand [*]
[*] speaking of which, it's faster to access a variable than the index of an array, like $_POST.
You can simulate multithreading by having your main program split the huge CSV file into a smaller one and run each file into a different process.
common.php
class FileLineFinder {
protected $handle, $length, $curpos;
public function __construct($file){
$handle = fopen($file, 'r');
$length = strlen(PHP_EOL);
}
public function next_line(){
while(!feof($this->handle)){
$b = fread($this->handle, $this->length);
$this->curpos += $this->length;
if ($b == PHP_EOL) return $this->curpos;
}
return false;
}
public function skip_lines($count){
for($i = 0; $i < $count; $i++)
$this->next_line();
}
public function __destruct(){
fclose($this->handle);
}
}
function exec_async($cmd, $outfile, $pidfile){
exec(sprintf("%s > %s 2>&1 & echo $! >> %s", $cmd, $outfile, $pidfile));
}
main.php
require('common.php');
$maxlines = 200; // maximum lines subtask will be processing at a time
$note = $_POST['note'];
$file = $_FILES['csv']['tmp_name'];
$outdir = dirname(__FILE__) . DIRECTORY_SEPARATOR . 'out' . DIRECTORY_SEPARATOR;
//make sure our output directory exists
if(!is_dir($outdir))
if(!mkdir($outdir, 0755, true))
die('Cannot create output directory: '.$outdir);
// run a task for each chunk of lines in the csv file
$i = 0; $pos = 0;
$l = new FileLineFinder($file);
do {
$i++;
exec_async(
'php -f sub.php -- '.$pos.' '.$maxlines.' '.escapeshellarg($file).' '.escapeshellarg($note),
$outdir.'proc'.$i.'.log',
$outdir.'proc'.$i.'.pid'
);
$l->skip_lines($maxlines);
} while($pos = $l->next_line());
// wait for each task to finish
do {
$tasks = count(glob($outdir.'proc*.pid'));
echo 'Remaining Tasks: '.$tasks.PHP_EOL;
} while ($tasks > 0);
echo 'Finished!'.PHP_EOL;
sub.php
require('common.php');
$start = (int)$argv[1];
$count = (int)$argv[2];
$file = $argv[3];
$note = mysql_real_escape_string($argv[4]);
$lines = 0;
$handle = fopen($file, 'r');
fseek($handle, $start, SEEK_SET);
$expression = "/^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$/";
mysql_query('SET NAMES utf8');
//loop through the csv file and insert into database
do {
$lines++;
if ($data[0]) {
if (preg_match($expression, $data[0])) {
$query = mysql_query('SELECT * FROM `postfix`.`recipient_access` where recipient="'.$data[0].'"');
$fetch = mysql_fetch_array($query);
if($fetch['recipient'] != $data[0]){
$query = mysql_query('INSERT INTO `postfix`.`recipient_access`(`recipient`, `note`) VALUES("'.$data[0].'","'.$note.'")');
}
}
}
} while (($data = fgetcsv($handle, 1000, ',', '\'')) && ($lines < $count));
Credits
https://stackoverflow.com/a/2162528/314056
https://stackoverflow.com/a/45966/314056
The most pressing thing to do is to make sure your database is properly indexed so the lookup query you do for every row is as fast as possible.
Other than that, there simply isn't that much you can do. For a multithreaded solution, you'll have to go outside PHP.
You could also just import the CSV file in mySQL, and then weed out the superfluous data using your PHP script - that is likely to be the fastest way.
Just a general suggestion: The key to speed up any program is to know which part take most of the time.
And then figure out how to reduce it. Sometimes you will be very surprised by the actual result.
btw, I don't think multithreading would solve your your problem.
Put the whole loop inside an SQL transaction. That will speed things up by an order of magnitude.
I'm trying to make this save a file and it creates the file, but it's always empty. This is the code for it:
<?php
$code = htmlentities($_POST['code']);
$i = 0;
$path = 'files/';
$file_name = '';
while(true) {
if (file_exists($path . strval($i) . '.txt')) {
$i++;
} else {
$name = strval($i);
$file_name = $path . $name . '.txt';
break;
}
}
fopen($file_name, 'w');
fwrite($file_name, $code);
fclose($file_name);
header("location: index.php?file=$i");
?>
I echoed out $code to make sure it wasn't empty, and it wasn't. I also tried replacing
fwrite($file_name, $code);
with this:
fwrite($file_name, 'Test');
and it was still empty. I have written to files a couple of times before in PHP, but I'm still really new to PHP and I have no idea whats wrong. Could someone tell me what I'm doing wrong or how to fix this? Thanks
Reading/Writing to/from a file or stream requires a resource handle:
$resource = fopen($file_name, 'w');
fwrite($resource, $code);
fclose($resource);
The resource handle $resource is essentially a pointer to the open file/stream resource. You interact with the created resource handle, not the string representation of the file name.
This concept also exists with cURL as well. This is a common practice in PHP, especially since PHP didn't have support for OOP when these methods came to be.
Take a look of the samples on php.net
My telecom vendor is sending me a report each time a message goes out. I have written a very simple PHP script that receive values via HTTP GET. Using fwrite I write the query parameter to a CSV file.The filename is report.csv with the current date as a prefix.
Here is the code :
<?php
error_reporting(E_ALL ^ E_NOTICE);
date_default_timezone_set('America/New_York');
//setting a the CSV File
$fileDate = date("m-d-Y") ;
$filename = $fileDate."_Report.csv";
$directory = "./csv_archive/";
//Creating handle
$handle = fopen($filename, "a");
//These are the main data field
$item1 = $_GET['item1'];
$item2 = $_GET['item2'];
$item3 = $_GET['item3'];
$mydate = date("Y-m-d H:i:s") ;
$pass = $_GET['pass'];
//testing the pass
if (isset($_GET['pass']) AND $_GET['pass'] == "password")
{
echo 'Login successful';
// just making sure the function could write to it
if (!$handle = fopen($directory.$filename, 'a')){
echo "Cannot open file ($filename)";
exit;
}
//writing the data I receive through query string
if (fwrite($handle, "$item1,$item2,$item3,$mydate \n") === FALSE) {
echo "Cannot write to file ($filename)";
exit;
}
fclose($handle);
}
else{
echo 'Login Failure please add the right pass to URL';
}
?>
The script does what I want, but the only problem is inconsistency, meaning that a good portion of the records are missing (about half the report). When I log to my account I can get the complete report.
I have no clue of what I need to do to fix this, please advice.
I have a couple of suggestions for this script.
To address Andrew Rhyne's suggestion, change your code that reads from each $GET variable to:
$item1 = (isset($_GET['item1']) && $_GET['item1']) ? $_GET['item1'] : 'empty';
This will tell you if all your fields are being populated.
I suspect you problem is something else. It sounds like you are getting a seperate request for each record that you want to save. Perhaps some of these requests are happening to close together and are messing up each other's ability to open and write to the file. To check if this is happening, you might try using the following code check if you opened the file correctly. (Note that your first use of 'fopen' in your script does nothing, because you are overwriting $handle with your second use of 'fopen', it is also opening the wrong file...)
if (!$handle = fopen($directory.$filename, 'a')){
$handle = fopen($directory.date("Y-m-d H:i:s:u").'_Record_Error.txt', 'a');
exit;
}
This will make sure that you don't ever lose data because of concurrent write attempts. If you find that this is indeed you issue, you can delay subsequent write attempts until the file is not busy.
$tries = 0;
while ($tries < 50 && !$handle = fopen($directory.$filename, 'a')){
sleep(.5);//wait half a second
$tries++;
}
if($handle){
flock($handle);//lock the file to prevent other requests from opening the file until you are done.
} else {
$handle = fopen($directory.date("Y-m-d H:i:s:u").'_Record_Error.txt', 'a');//the 'u' is for milliseconds
exit;
}
This will spend 25 seconds, trying to open the file once every half second and will still output your record to a unique file every time you are still unable to open the file to write to. You can then safely fwrite() and fclose() $handle as you were.
Is there any alternative to file_get_contents that would create the file if it did not exist. I am basically looking for a one line command. I am using it to count download stats for a program. I use this PHP code in the pre-download page:
Download #: <?php $hits = file_get_contents("downloads.txt"); echo $hits; ?>
and then in the download page, I have this.
<?php
function countdownload($filename) {
if (file_exists($filename)) {
$count = file_get_contents($filename);
$handle = fopen($filename, "w") or die("can't open file");
$count = $count + 1;
} else {
$handle = fopen($filename, "w") or die("can't open file");
$count = 0;
}
fwrite($handle, $count);
fclose($handle);
}
$DownloadName = 'SRO.exe';
$Version = '1';
$NameVersion = $DownloadName . $Version;
$Cookie = isset($_COOKIE[str_replace('.', '_', $NameVersion)]);
if (!$Cookie) {
countdownload("unqiue_downloads.txt");
countdownload("unique_total_downloads.txt");
} else {
countdownload("downloads.txt");
countdownload("total_download.txt");
}
echo '<META HTTP-EQUIV=Refresh CONTENT="0; URL='.$DownloadName.'" />';
?>
Naturally though, the user accesses the pre-download page first, so its not created yet. I do not want to add any functions to the pre download page, i want it to be plain and simple and not alot of adding/changing.
Edit:
Something like this would work, but its not working for me?
$count = (file_exists($filename))? file_get_contents($filename) : 0; echo $count;
Download #: <?php
$hits = '';
$filename = "downloads.txt";
if (file_exists($filename)) {
$hits = file_get_contents($filename);
} else {
file_put_contents($filename, '');
}
echo $hits;
?>
you can also use fopen() with 'w+' mode:
Download #: <?php
$hits = 0;
$filename = "downloads.txt";
$h = fopen($filename,'w+');
if (file_exists($filename)) {
$hits = intval(fread($h, filesize($filename)));
}
fclose($h);
echo $hits;
?>
Type juggling like this can lead to crazy, unforeseen problems later. to turn a string to an integer, you can just add the integer 0 to any string.
For example:
$f = file_get_contents('file.php');
$f = $f + 0;
echo is_int($f); //will return 1 for true
however, i second the use of a database instead of a text file for this. there's a few ways to go about it. one way is to insert a unique string into a table called 'download_count' every time someone downloads the file. the query is as easy as "insert into download_count $randomValue" - make sure the index is unique. then, just count the number of rows in this table when you need the count. the number of rows is the download count. and you have a real integer instead of a string pretending to be an integer. or make a field in your 'download file' table that has a download count integer. each file should be in a database with an id anyway. when someone downloads the file, pull that number from the database in your download function, put it into a variable, increment, update table and show it on the client however you want. use PHP with jQuery Ajax to update it asynchronously to make it cool.
i would still use php and jquery.load(file.php) if you insist on using a text file. that way, you can use your text file for storing any kind of data and just load the specific part of the text file using context selectors. the file.php accepts the $_GET request, loads the right portion of the file and reads the number stored in the file. it then increments the number stored in the file, updates the file and sends data back to the client to be displayed any way you want. for example, you can have a div in your text file with an id set to 'downloadcount' and a div with an id for any other data you want to store in this file. when you load file.php, you just send div#download_count along with the filename and it will only load the value stored in that div. this is a killer way to use php and jquery for cool and easy Ajax/data driven apps. not to turn this into a jquery thread, but this is as simple as it gets.
You can use more concise equivalent yours function countdownload:
function countdownload($filename) {
if (file_exists($filename)) {
file_put_contents($filename, 0);
} else {
file_put_contents($filename, file_get_contents($filename) + 1);
}
}