I'm trying to split big csv files.Right now I could only split csv files w/50k columns.Whenever I try splitting a 100k it doesn't work.
I can't figure out whats wrong.
Here is my code for the splitter I use for 100k:
$inputFile = 'uploads/uploaded.csv';
$outputFile = 'uploads/output';
$rows = array_map('str_getcsv', file($inputFile));
$header = array_shift($rows);
$splitSize = 50000;
$in = fopen($inputFile, 'r');
fgetcsv($in, 1000, ",");
//array for file name
$stored_names = array();
$rowCount = 0;
$fileCount = 1;
//split csv
while (!feof($in)) {
if (($rowCount % $splitSize) == 0) {
if ($rowCount > 0) {
fclose($out);
}
$super_file_name = $outputFile . $fileCount++;
array_push($stored_names,$super_file_name.'.csv');
$out = fopen($super_file_name. '.csv', 'w');
//insert header
fputcsv($out,$header);
// array_push($stored_names,$out);
}
$data = fgetcsv($in);
if ($data)
fputcsv($out,$data);
$rowCount++;
}
fclose($out);
I'm guessing that your issue is a memory limit based on this code:
$rows = array_map('str_getcsv', file($inputFile));
$header = array_shift($rows);
This reads the entire file into memory, splits it into an array of arrays, then pops off the first row, and throws away the rest. Since you only need the first row, you don't need to read the whole file. Instead just do something like:
$fp = fopen($inputFile, 'r');
$headers = fgetcsv($fp);
Then you have $fp already open and pointing to the first data line for your splitting process,.
Related
First of all I load PHPExcel.php
Secondly, I am using this code:
$location = '/path/file.csv';
$inputFileType = 'CSV';
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$objPHPExcel = $objReader->load($location);
$worksheet = $objPHPExcel->getActiveSheet();
$list = array();
foreach ($worksheet->getRowIterator() as $row)
{
$rowIndex = $row->getRowIndex();
$cellValue = $worksheet->getCell('A'.$rowIndex)->getValue();
array_push($list, $cellValue);
}
$count = count($list);
for ($rowIndex = $count; $rowIndex != 1; $rowIndex--)
{
$cellValue = $worksheet->getCell('A'.$rowIndex)->getValue();
for ($i = $rowIndex - 2; $i != 0; $i--)
{
if ($list[$i] == $cellValue)
{
$worksheet->removeRow($rowIndex);
$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'CSV');
$objWriter->save($location);
break;
}
}
}
So, I am trying to remove the rows when there are duplicate values in the first column. The code does not work. When I first run it in putty, I have to wait for ages. I interrupt the process and then I run it again. Then it runs, but in my csv file I have wrong results (duplicates are 300 but I am getting -600 rows).
In order to read a CSV file you dont have to use PHPExcel. Instead you can use a native php code like this one:
<?php
// Array which will hold all analyzed lines
$uniqueEntries = array();
$dublicatedEntries = array();
$delimiter = ',';
$file = 'test.csv';
//Open the file
if (($handle = fopen($file, "r")) !== false) {
// read each line into an array
while (($data = fgetcsv($handle, 8192, $delimiter)) !== false) {
// build a "line" from the parsed data
$line = join($delimiter, $data);
//If the line content has ben discovered before - save to duplicated and skip the rest..
if (isset($uniqueEntries[$line])){
dublicatedEntries[] = $line;
continue;
}
// save the line
$uniqueEntries[$line] = true;
}
fclose($handle);
}
// build the new content-data
$contents = '';
foreach ($uniqueEntries as $line => $bool) $contents .= $line . "\r\n";
// save it to a new file
file_put_contents("test_unique.csv", $contents);
?>
This code is untested but should work.
This will give you a .csv file with all unique entries.
To parse CSV files in php im using this function:
private function _csvToArray($url, $delimiter=',')
{
$csvData = file_get_contents($url);
$lines = explode(PHP_EOL, $csvData);
$array = array();
foreach ($lines as $line) {
$array[] = str_getcsv($line, $delimiter);
}
return $array;
}
The problem here is Im using EOL to determine where a line ends, if the CSV file have any field with any end of line chars im getting errors.
Example:
Product_Name, "Description"
Product_Name, "Description"
Product_Name, "Description"
Product_Name, "Description"
This works ok, but if I have something like this:
Product_Name, "Description_line_1
Description_line_2"
Product_Name, "Description_line_1
Description_line_2"
Product_Name, "Description_line_1
Description_line_2"
The script will fail, is there any way I can improve the script in order to consider this or is better to use a regular expression to fix first the CSV before calling the sript?
If you want to save writing to a temporary file yourself you can use the memory stream.
private function _csvToArray($url, $delimiter=',')
{
$fp = fopen('php://memory', 'r+');
fwrite($fp, file_get_contents($url));
fseek($fp, 0);
$array = array();
while ($row = fgetcsv($fp, 0, $delimiter)) {
$array[] = $row;
}
fclose($fp);
return $array;
}
fgetcsv can handle EOL in fields if the field data is between enclosure characters.
private function _csvToArray($url, $delimiter=',', $enclosure='"')
{
$handle = fopen($url, 'r');
$array = array();
while($row = fgetcsv($handle, 0, $delimiter, $enclosure))
{
$array[] = $row;
}
fclose($handle);
return $array;
}
Something like this should work (havent properly tested the code):
$csv = array_map('str_getcsv', file($url), ',', '"');
I had an old code lying around which fixed this once for me... But remember... it's from way way back;
$url = 'file.csv';
$csv = array();
$csvContents = file_get_contents($url);
$lines = explode('"'."\n", trim($csvContents));
foreach($lines as $lineNumber => $line) {
$csv[$lineNumber] = array();
$fields = explode(',', $line);
foreach($fields as $field) {
$csv[$lineNumber][] = ltrim(rtrim($field, '"'), '"');
}
}
I'm using a simple function to write write arrays to a CSV-file, which look like this:
function writeToCSV($array) {
$fp = fopen('programmes.csv', 'a');
fputcsv($fp, $array);
fclose($fp);
}
Simple as a pie. However, is there anyway to know what line-number the pointer is at? Because I want to be able to after 1000 lines to begin writing to a new file. Why? Because I need to be able to import them to a database later with some memory constraints, and to parse a CSV-file with 15000 lines is a no-no.
function writeToCSV($array) {
$i = 1;
$j = 1;
$fp = fopen('programmes' . $j . '.csv', 'a');
foreach($array as $fields) {
if ($i % 1000 == 0) {
fclose($fp);
$fp = fopen('programmes' . $j . '.csv', 'a');
$j = $j + 1;
}
fputcsv($fp, $fields);
$i = $i + 1;
}
fclose($fp);
}
Try this:
count(file('programmes.csv'));
This will give you the number of lines in a file.
I haven't tried if this works, but i would do something like this:
<?php
function writeToCSV($array) {
// count lines in the current file
$linecount = 0;
$fh = fopen('programmes.csv','rb') or die("ERROR OPENING DATA");
while (fgets($fh) !== false) $linecount++;
fclose($fh);
$aSize = sizeof($array);
if (($linecount + $aSize) > 1000) {
// split array
$limit = 1000 - $linecount;
$a = array_slice($array, 0, $limit);
$b = array_slice($array, $limit);
// write into first file
$fp = fopen('programmes.csv', 'a');
foreach($a as $field) fputcsv($fp, $field);
fclose($fp);
// write into second file
$fp = fopen('programmes2.csv', 'a');
foreach($b as $field) fputcsv($fp, $field);
fclose($fp);
} else {
$fp = fopen('programmes.csv', 'a');
$idx = 0;
while ($linecount < 1000) {
// fill the file to the 1000 lines
fputcsv($fp, $array[$idx]);
++$linecount;
++$idx;
}
fclose($fp);
if ($idx != $aSize) {
// create new file
$fp = fopen('programmes.csv', 'a');
while ($idx< $aSize) {
// fill the file to the 1000 lines
fputcsv($fp, $array[$idx]);
++$idx;
}
fclose($fp);
}
}
}
?>
how to remove every line except the first 20 using php from a text file?
If loading the entire file in memory is feasible you can do:
// read the file in an array.
$file = file($filename);
// slice first 20 elements.
$file = array_slice($file,0,20);
// write back to file after joining.
file_put_contents($filename,implode("",$file));
A better solution would be to use the function ftruncate which takes the file handle and the new size of the file in bytes as follows:
// open the file in read-write mode.
$handle = fopen($filename, 'r+');
if(!$handle) {
// die here.
}
// new length of the file.
$length = 0;
// line count.
$count = 0;
// read line by line.
while (($buffer = fgets($handle)) !== false) {
// increment line count.
++$count;
// if count exceeds limit..break.
if($count > 20) {
break;
}
// add the current line length to final length.
$length += strlen($buffer);
}
// truncate the file to new file length.
ftruncate($handle, $length);
// close the file.
fclose($handle);
For a memory efficient solution you can use
$file = new SplFileObject('/path/to/file.txt', 'a+');
$file->seek(19); // zero-based, hence 19 is line 20
$file->ftruncate($file->ftell());
Apologies, mis-read the question...
$filename = "blah.txt";
$lines = file($filename);
$data = "";
for ($i = 0; $i < 20; $i++) {
$data .= $lines[$i] . PHP_EOL;
}
file_put_contents($filename, $data);
Something like:
$lines_array = file("yourFile.txt");
$new_output = "";
for ($i=0; $i<20; $i++){
$new_output .= $lines_array[$i];
}
file_put_contents("yourFile.txt", $new_output);
This should work as well without huge memory usage
$result = '';
$file = fopen('/path/to/file.txt', 'r');
for ($i = 0; $i < 20; $i++)
{
$result .= fgets($file);
}
fclose($file);
file_put_contents('/path/to/file.txt', $result);
So I have a CSV file that looks like this:
12345, Here is some text
20394, Here is some more text
How can I insert this into an array that looks like so
$text = "12345" => "Here is some text",
"20394" => "Here is some more text";
This is what I currently had to get a single numerical based value on a one tier CSV
if ($handle = fopen("$qid", "r")) {
$csvData = file_get_contents($qid);
$csvDelim = "\r";
$qid = array();
$qid = str_getcsv($csvData, $csvDelim);
} else {
die("Could not open CSV file.");
}
Thanks for the replies, but I still see a potential issue. With these solutions, wouldn't the values store in this way:
$array[0] = 12345
$array[1] = Here is some text 20394
$array[2] = Here is some more text
If I tried this on the example csv above, how would the array be structured?
You can use fgetcsv() to read a line from a file into an array. So something like this:
$a = array();
$f = fopen(....);
while ($line = fgetcsv($f))
{
$key = array_shift($line);
$a[$key] = $line;
}
fclose($f);
var_dump($a);
Assuming that the first row in the CSV file contains the column headers, this will create an associative array using those headers for each row's data:
$filepath = "./test.csv";
$file = fopen($filepath, "r") or die("Error opening file");
$i = 0;
while(($line = fgetcsv($file)) !== FALSE) {
if($i == 0) {
$c = 0;
foreach($line as $col) {
$cols[$c] = $col;
$c++;
}
} else if($i > 0) {
$c = 0;
foreach($line as $col) {
$data[$i][$cols[$c]] = $col;
$c++;
}
}
$i++;
}
print_r($data);
If you are reading a file I can recommend using something like fgetcsv()
This will read each line in the CSV into an array containing all the columns as values.
http://at2.php.net/fgetcsv
$csv_lines = explode('\n',$csv_text);
foreach($csv_lines as $line) {
$csv_array[] = explode(',',$line,1);
}
edit - based on code posted after original question:
if ($handle = fopen("$qid", "r")) {
$csvData = file_get_contents($qid);
$csvDelim = "\r"; // assume this is the line delim?
$csv_lines = explode($csvDelim,$csvData);
foreach($csv_lines as $line) {
$qid[] = explode(',',$line,1);
}
} else {
die("Could not open CSV file.");
}
With your new file with two columns, $qid should become an array with two values for each line.
$csvDelim = ",";
$qid = str_getcsv($csvData, $csvDelim);
$text[$qid[0]] = $qid[1];