Remove duplicates from CSV file using PHP - php

First of all I load PHPExcel.php
Secondly, I am using this code:
$location = '/path/file.csv';
$inputFileType = 'CSV';
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$objPHPExcel = $objReader->load($location);
$worksheet = $objPHPExcel->getActiveSheet();
$list = array();
foreach ($worksheet->getRowIterator() as $row)
{
$rowIndex = $row->getRowIndex();
$cellValue = $worksheet->getCell('A'.$rowIndex)->getValue();
array_push($list, $cellValue);
}
$count = count($list);
for ($rowIndex = $count; $rowIndex != 1; $rowIndex--)
{
$cellValue = $worksheet->getCell('A'.$rowIndex)->getValue();
for ($i = $rowIndex - 2; $i != 0; $i--)
{
if ($list[$i] == $cellValue)
{
$worksheet->removeRow($rowIndex);
$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'CSV');
$objWriter->save($location);
break;
}
}
}
So, I am trying to remove the rows when there are duplicate values in the first column. The code does not work. When I first run it in putty, I have to wait for ages. I interrupt the process and then I run it again. Then it runs, but in my csv file I have wrong results (duplicates are 300 but I am getting -600 rows).

In order to read a CSV file you dont have to use PHPExcel. Instead you can use a native php code like this one:
<?php
// Array which will hold all analyzed lines
$uniqueEntries = array();
$dublicatedEntries = array();
$delimiter = ',';
$file = 'test.csv';
//Open the file
if (($handle = fopen($file, "r")) !== false) {
// read each line into an array
while (($data = fgetcsv($handle, 8192, $delimiter)) !== false) {
// build a "line" from the parsed data
$line = join($delimiter, $data);
//If the line content has ben discovered before - save to duplicated and skip the rest..
if (isset($uniqueEntries[$line])){
dublicatedEntries[] = $line;
continue;
}
// save the line
$uniqueEntries[$line] = true;
}
fclose($handle);
}
// build the new content-data
$contents = '';
foreach ($uniqueEntries as $line => $bool) $contents .= $line . "\r\n";
// save it to a new file
file_put_contents("test_unique.csv", $contents);
?>
This code is untested but should work.
This will give you a .csv file with all unique entries.

Related

How to write and save CSV file to server in PHP?

I am trying to create a new CSV file using PHP and upload or move it to a new part of the server but the spreadsheet it returns is a spreadsheet that has only the first cell in the first row with a value of either 404 or 1. What am I doing wrong?
My code is attached below.
// genrate new general spreadsheet
$filepath = substr($file_path, 1);
$data = load_csv_file($filepath);
header('Content-type: text/csv');
header('Content-Disposition: attachment; filename="file-saved.csv"');
$fp = fopen('php://output', 'wb');
foreach ($data as $row) {
$output = fputcsv($fp, $row);
}
$filename = "file-saved.csv";
file_put_contents( $filename, $output);
fclose($fp);
The $data variable is an array of values from another CSV file.
$output = [];
foreach($data as $row) {
$output[] = ..
}
...
error_reporting(0);
$file_n = public_path('/csv_file/product_details.csv');
$infoPath = pathinfo($file_n);
if($infoPath['extension'] == 'csv'){
$file = fopen($file_n, "r");
$i = 0;
$all_data = array();
while ( ($filedata = fgetcsv($file, null, "|")) !==FALSE) {
$num = count($filedata );
for ($c=0; $c < $num; $c++) {
$all_data[$i][] = $filedata [$c];
}
$i++;
}
fclose($file);
foreach($all_data as $importData){
$insertData = array(
"article_number"=>$importData[0],
"article_name"=>$importData[1],
"article_description"=>$importData[2],
"article_price"=>$importData[3],
"article_manufacturer"=>$importData[6],
"article_productgroupkey"=>$importData[7],
"article_productgroup"=>$importData[8],
"article_ean"=>$importData[9],
"article_hbnr"=>$importData[10],
"article_shippingcosttext"=>$importData[11],
"article_amount"=>$importData[12],
"article_paymentinadvance"=>$importData[13],
"article_maxdeliveryamount"=>$importData[14],
"article_energyefficiencyclass"=>$importData[15]
);
insertData($insertData);
}
}else{
echo "Invalid file extension.";
}
function insertData($data){
if($article_number->count() == 0){
//write your insert query here for $data
}elseif($article_number->count() > 0){
//article_number already present then update the table.UPDATE QUERY
}
}

Modify PHP script to read from CSV file

I have script which was reading two variable from CSV and those variable was at column 0 and column 5 and it was fine:
$file = fopen($qtyfile,"r");
output("reading file $qtyfile");
$i=0;
$imported = 0;
$failed = 0;
while(! feof($file))
{
$i++;
$line = (fgetcsv($file));
if($i==1) continue;
$cols = explode(';',$line[0]);
$pcode = $cols[0];
$stock = $cols[5];
The CSV file has been orginzed in different way, and the variable now at column 16 & 19 i tried to modify the code to the following but it's not working :
$file = fopen($qtyfile,"r");
output("reading file $qtyfile");
$i=0;
$imported = 0;
$failed = 0;
while(! feof($file))
{
$i++;
$line = (fgetcsv($file));
if($i==1) continue;
$cols = explode(';',$line[0]);
$pcode = $cols[16];
$stock = $cols[19];
can you please help to make the script ready from new column.
any help will be appreciated.
Link to full script and the csv:
Here's an awesome little function which will parse any csv file into an associative array with column titles from row 1 as array keys.
function csvArray($file) {
$csv = array_map('str_getcsv', file($file));
array_walk($csv, function(&$a) use ($csv) {
$a = array_combine($csv[0], $a);
});
array_shift($csv);
return $csv;
}
Usage:
$output = csvArray($file);
foreach ($output as $o) {
echo $o['pcode'];
// whatever
}
I have to deal with quite a lot of csv files and I use this all the time. Once the data is in an array it's much easier to deal with, and it doesn't matter if columns get moved around as long as the name stays the same it won't break your code.

skip columns while converting tab delimited text file to csv php

I am trying to convert a tab delimited file to csv. The problem is its a huge file. 100000 plus records. And i want only specific columns from that file. The file is not generated by me but by amazon so i cant really control the format.
The code i made works fine. But i need to ignore/remove some columns or rather i want only few columns from that. How do i do that without effecting the performance of conversion from txt to csv.
$file = fopen($file_name.'.txt','w+');
fwrite($file,$report);
fclose($file);
$handle = fopen($file_name.".txt", "r");
$lines = [];
$row_count=0;
$array_count = 0;
$uid = array($user_id);
if (($handle = fopen($file_name.".txt", "r")) !== FALSE)
{
while (($data = fgetcsv($handle, 100000, "\t")) !== FALSE)
{
if($row_count>0)
{
$lines[] = str_replace(",","<c>",$data);
array_push($lines[$array_count],$user_id);
$array_count++;
}
$row_count++;
}
fclose($handle);
}
$fp = fopen($file_name.'.csv', 'w');
foreach ($lines as $line)
{
fputcsv($fp, $line);
}
fclose($fp);
I am using unset to remove any column. But is there a better way ? for multiple columns.
I would do that by checking keys. For example:
// columns keys you don't wanna skip
$keys = array(0, 1, 3, 4, 7, 9);
$lines = file($file_name);
$result_lines = array();
foreach ($lines as $line) {
$tmp = array();
$tabs = explode("\t", $line);
foreach($tabs as $key => $value){
if(in_array($key, $keys)){
$tmp[] = $value;
}
}
$result_lines[] = implode(",", $tmp);
}
$finalString = implode("\n", $result_lines);
// Then write string to file
Hope it helps.
Cheers,
SiniĊĦa
In its simplest form i.e. without worrying about removing columns from the output this will do a simple read line and write line, therefore no need to maintain any memory hungry arrays.
$file_name = 'tst';
if ( ($f_in = fopen($file_name.".txt", "r")) === FALSE) {
echo 'Cannot find inpout file';
exit;
}
if ( ($f_out = fopen($file_name.'.csv', 'w')) === FALSE ) {
echo 'Cannot open output file';
exit;
}
while ($data = fgetcsv($f_in, 8000, "\t")) {
fputcsv($f_out, $data, ',', '"');
}
fclose($f_in);
fclose($f_out);
This is one way of removing the unwanted columns
$file_name = 'tst';
if ( ($f_in = fopen("tst.txt", "r")) === FALSE) {
echo 'Cannot find inpout file';
exit;
}
if ( ($f_out = fopen($file_name.'.csv', 'w')) === FALSE ) {
echo 'Cannot open output file';
exit;
}
$unwanted = [26,27]; //index of unwanted columns
while ($data = fgetcsv($f_in, 8000, "\t")) {
// remove unwanted columns
foreach($unwanted as $i) {
unset($data[$i]);
}
fputcsv($f_out, $data, ',', '"');
}
fclose($f_in);
fclose($f_out);

Splitting big csv files

I'm trying to split big csv files.Right now I could only split csv files w/50k columns.Whenever I try splitting a 100k it doesn't work.
I can't figure out whats wrong.
Here is my code for the splitter I use for 100k:
$inputFile = 'uploads/uploaded.csv';
$outputFile = 'uploads/output';
$rows = array_map('str_getcsv', file($inputFile));
$header = array_shift($rows);
$splitSize = 50000;
$in = fopen($inputFile, 'r');
fgetcsv($in, 1000, ",");
//array for file name
$stored_names = array();
$rowCount = 0;
$fileCount = 1;
//split csv
while (!feof($in)) {
if (($rowCount % $splitSize) == 0) {
if ($rowCount > 0) {
fclose($out);
}
$super_file_name = $outputFile . $fileCount++;
array_push($stored_names,$super_file_name.'.csv');
$out = fopen($super_file_name. '.csv', 'w');
//insert header
fputcsv($out,$header);
// array_push($stored_names,$out);
}
$data = fgetcsv($in);
if ($data)
fputcsv($out,$data);
$rowCount++;
}
fclose($out);
I'm guessing that your issue is a memory limit based on this code:
$rows = array_map('str_getcsv', file($inputFile));
$header = array_shift($rows);
This reads the entire file into memory, splits it into an array of arrays, then pops off the first row, and throws away the rest. Since you only need the first row, you don't need to read the whole file. Instead just do something like:
$fp = fopen($inputFile, 'r');
$headers = fgetcsv($fp);
Then you have $fp already open and pointing to the first data line for your splitting process,.

How can I parse a CSV into array with first value as key?

So I have a CSV file that looks like this:
12345, Here is some text
20394, Here is some more text
How can I insert this into an array that looks like so
$text = "12345" => "Here is some text",
"20394" => "Here is some more text";
This is what I currently had to get a single numerical based value on a one tier CSV
if ($handle = fopen("$qid", "r")) {
$csvData = file_get_contents($qid);
$csvDelim = "\r";
$qid = array();
$qid = str_getcsv($csvData, $csvDelim);
} else {
die("Could not open CSV file.");
}
Thanks for the replies, but I still see a potential issue. With these solutions, wouldn't the values store in this way:
$array[0] = 12345
$array[1] = Here is some text 20394
$array[2] = Here is some more text
If I tried this on the example csv above, how would the array be structured?
You can use fgetcsv() to read a line from a file into an array. So something like this:
$a = array();
$f = fopen(....);
while ($line = fgetcsv($f))
{
$key = array_shift($line);
$a[$key] = $line;
}
fclose($f);
var_dump($a);
Assuming that the first row in the CSV file contains the column headers, this will create an associative array using those headers for each row's data:
$filepath = "./test.csv";
$file = fopen($filepath, "r") or die("Error opening file");
$i = 0;
while(($line = fgetcsv($file)) !== FALSE) {
if($i == 0) {
$c = 0;
foreach($line as $col) {
$cols[$c] = $col;
$c++;
}
} else if($i > 0) {
$c = 0;
foreach($line as $col) {
$data[$i][$cols[$c]] = $col;
$c++;
}
}
$i++;
}
print_r($data);
If you are reading a file I can recommend using something like fgetcsv()
This will read each line in the CSV into an array containing all the columns as values.
http://at2.php.net/fgetcsv
$csv_lines = explode('\n',$csv_text);
foreach($csv_lines as $line) {
$csv_array[] = explode(',',$line,1);
}
edit - based on code posted after original question:
if ($handle = fopen("$qid", "r")) {
$csvData = file_get_contents($qid);
$csvDelim = "\r"; // assume this is the line delim?
$csv_lines = explode($csvDelim,$csvData);
foreach($csv_lines as $line) {
$qid[] = explode(',',$line,1);
}
} else {
die("Could not open CSV file.");
}
With your new file with two columns, $qid should become an array with two values for each line.
$csvDelim = ",";
$qid = str_getcsv($csvData, $csvDelim);
$text[$qid[0]] = $qid[1];

Categories