How Can I Remove Duplicate Rows from CSV file with PHP - php

I have CSV file that looks like this:
account, name, email,
123, John, dsfs#email.com
123, John, dsfs#email.com
1234, Alex, ala#email.com
I need to remove duplicate rows.I try to do it like this:
$inputHandle = fopen($inputfile, "r");
$csv = fgetcsv($inputHandle, 1000, ",");
$accounts_unique = array();
$accounts_unique = array_unique($csv);
print("<pre>".print_r($accounts_unique, true)."</pre>");
But I get in print_r only first headers row.
What needs to be done in order to make sure I
1. I clean the CSV file from duplicate rows
2. I can make some list of those duplicates (maybe store them in another CSV?)

Simple solution, but it requires a lot of memory if file is really big.
$lines = file('csv.csv');
$lines = array_unique($lines);
file_put_contents(implode(PHP_EOL, $lines));

I would go this route, which will be faster than array_unique:
$inputHandle = fopen($inputfile, "r");
$csv = trim(fgetcsv($inputHandle, 1000, ","));
$data = array_flip(array_flip($csv)); //removes duplicates that are the same
$dropped = array_diff_key($csv, $data); //Get removed items.
Note -- array_unique() and array_flip(array_flip()) will only match for duplicate lines that are exactly the same.
Updated to include information from my comments.

If you are going to loop the data from the CSV anyway I think it would be best to do something like this.
$dataset = array();
foreach($line as $data){
$dataset[sha1($data)] = $data;
}

Related

How to change all values in a column of a csv file to a specific value php

I have a csv file that looks something like this (there are many more rows):
Jim,jim#email.com,8882,456
Bob,bob#email.com,8882,343
What I want to do is to change all the values in the fourth column,456,343 to 500.
I'm new to php and am not sure how to do this.
I have tried
<?php
$file = fopen('myfile.csv', 'r+');
$toBoot = array();
while ($data = fgetcsv($file)) {
echo $data[3];
$data[3] = str_replace($data[3],'500');
array_push($toBoot, $data);
}
//print_r($toBoot);
echo $toBoot[0][3];
fputcsv($file, $toBoot);
fclose($file)
?>
But it prints
Jim,jim#email.com,8882,456
Bob,bob#email.com,8882,343
Array,Array
not
Jim,jim#email.com,8882,500
Bob,bob#email.com,8882,500
I've looked at this post, PHP replace data only in one column of csv but it doesn't seem to work.
Any help appreciated. Thanks
You can use preg_replace and replace all values at once and not loop each line of the CSV file.
Two lines of code is all that is needed.
$csv = file_get_contents($path);
file_put_contents($path, preg_replace("/(.*),\d+/", "$1,500", $csv));
Where $path is the path and to the CSV file.
You can see it in action here: https://3v4l.org/Mc3Pm
A quick and dirty way to way to solve your problem would be:
foreach (file("old_file.csv") as $line)
{
$new_line = preg_replace('/^(.*),[\d]+/', "$1,500", $line);
file_put_contents("new_file.csv", $new_line, FILE_APPEND);
}
To change one field of the CSV, just assign to that array element, you don't need to use any kind of replace function.
$data[3] = "500";
fputcsv() is used to write one line to a CSV file, not the entire file at once. You need to call it in a loop. You also need to go back to the beginning of the file and remove the old contents.
fseek($file, 0);
ftruncate($file, 0);
foreach ($toBoot as $row) {
fputcsv($file, $row);
}

How do I get only unique values from CSV file array

I am building a small application that does some simple reporting based on CSV files, the CSV files are in the following format:
DATE+TIME,CLIENTNAME1,HAS REQUEST BLABLA1,UNIQUE ID
DATE+TIME,CLIENTNAME2,HAS REQUEST BLABLA2,UNIQUE ID
DATE+TIME,CLIENTNAME1,HAS REQUEST BLABLA1,UNIQUE ID
DATE+TIME,CLIENTNAME2,HAS REQUEST BLABLA2,UNIQUE ID
Now I am processing this using the following function:
function GetClientNames(){
$file = "backend/AllAlarms.csv";
$lines = file($file);
arsort($lines);
foreach ($lines as $line_num => $line) {
$line_as_array = explode(",", $line);
echo '<li><i class="icon-pencil"></i>' . $line_as_array[1] . '</li>';
}
}
I am trying to retrieve only the Clientname values, but I only want the unique values.
I have tried to create several different manners of approaching this, I understand I need to use the unique_array function, but I have no clue on exactly how to use this function.
I've tried this:
function GetClientNames(){
$file = "backend/AllAlarms.csv";
$lines = file($file);
arsort($lines);
foreach ($lines as $line_num => $line) {
$line_as_array = explode(",", $line);
$line_as_array[1] = unique_array($line_as_array[1]);
echo '<li><i class="icon-pencil"></i>' . $line_as_array[1] . '</li>';
}
}
But this gives me a very very dirty result with 100's of spaces instead of the correct data.
I would recommend you to use the fgetcsv() function when reading in csv files. In the wild csv files can be quite complicated handle by naive explode() approach:
// this array will hold the results
$unique_ids = array();
// open the csv file for reading
$fd = fopen('t.csv', 'r');
// read the rows of the csv file, every row returned as an array
while ($row = fgetcsv($fd)) {
// change the 3 to the column you want
// using the keys of arrays to make final values unique since php
// arrays cant contain duplicate keys
$unique_ids[$row[3]] = true;
}
var_dump(array_keys($unique_ids));
You can also collect values and use array_unique() on them later. You probably want to split the "reading in" and the "writing out" part of your code too.
Try using array_unique()
Docs:
http://php.net/manual/en/function.array-unique.php

PHP to remove last line of CSV file

still struggling with PHP and CSV file manipulation. I will try to ask this properly so I can get some help.
I have a CSV file with about 4000 lines/rows, and I wrote this code to create an array of the entire CSV, and pull the the LAST line of the CSV file out to use in my script. The code works to to all this wit success.
// CREATE ASSOCIATIVE ARRAY FROM LAST ROW OF CSV FILE
$csv = array();
if (FALSE !== $handle = fopen("Alabama-TEST.csv", "r"))
{
while (FALSE !== $row = fgetcsv($handle))
{
$csv[] = $row;
}
}
$new_csv = array();
foreach ($csv as $row)
{
$new_row = array();
for ($i = 0, $n = count($csv[0]); $i < $n; ++$i)
{
$new_row[$csv[0][$i]] = $row[$i];
}
$new_csv[] = $new_row;
}
The variable $new_row is the last row in the CSV and I am able to use the data fine. But when the script is finished running I want to delete this last row called $new_row.
Here is an example of my CSV file at the top view;
CITY ADDRESS STATE NAME
Birmingham 123 Park St. Alabama Franky
So I just want to remove the last row and keep the header at the top, for the next time the script runs. I've been trying for 3 days solid trying to figure this out, so I'm hoping some kind person who KNOWS WHAT THEY ARE DOING can help.
Here is the code I have tried to remove the last row;
$inp = file('Alabama-TEST.csv');
$out = fopen('Alabama-TEST.csv','w');
or ($i=0;$i<count($inp)-1;$i++)
fwrite($out,$inp[$i]);
fclose($out);
Since I'm using a large file at around 4000 rows, is there a way to do this without using too much memory?
You need to use array_pop on your array ($new_csv) and fputcsv to save the file:
array_pop($new_csv); // Removes the last element in the array
$out = fopen('Alabama-TEST-new.csv','w');
foreach ($new_csv as $row) { // sorry for writing it badly, try now
fputcsv($out, $row);
}
fclose($out);
I don't have an environment to test this, sorry.

selecting and manipulating individual CSV columns in php

I am trying to use a function much like this.....
$file = fopen("/tmp/$importedFile.csv","r");
while ($line = fgetcsv($file))
{
$csv_data[] = $line;
}
fclose($file);
...to load CSV values. This is gravy but now I wish to select individual columns by their array number. I believe I want to select it with something like this, but cannot find any clarity.
$csv_data[2] = $line;
This however just shows second (third) row of data rather than column.
Regards
Do you need the whole file in memory or will you be processing the lines individually?
Processing individually:
$line is already an array. If you want the 3rd column, use $line[2]
Processing after reading the whole file:
$csv_data[$lineNo][$columnNo]
$inputfiledelimiter = ",";
if (($handle = fopen($PathOfCsvFile, "r")) !== FALSE)
{
while (($data = fgetcsv($handle, 0, $inputfiledelimiter)) !== FALSE)
{
//get data from $data
}
}
Well, your CSV file is now split up in lines, that is all.
No concept of columns yet in that structure.
So you need to split the lines into columns.
Or, much better, let PHP do that for you: Have a look at fgetcsv() and the associated functions:
http://nl.php.net/manual/en/function.fgetcsv.php

Replace duplicates in a text file with PHP?

I have text files that have list of thousands of names like this
DallasWebJobs
DallasWebJobs
DallasWebJobs
php_gigs
brotherjudkins
goldbergwb
SanDiegoWebJobs
brinteractive
muracms
browan85
php_gigs
php_gigs
php_gigs
php_gigs
1 name per line, 1 file may have up to 30,000 names on it though and I need to replace all duplicate names because probably as many as half are duplicates.
I would like to do this in PHP, 1 though was importing each line into a MySQL database and then doing it but that seems like overkill, i'm sure there is an easier way.
Please help if you can
Update I found this for emails, it should work too
$list = file('./Emailist.txt');
$list_unique = array_unique($list);
foreach ($list_unique as $mail) {
echo $mail;
}
From php.net: serg dot podtynnyi at gmail dot com 06-Feb-2009 11:21
//Remove duplicates from a text files and dump result in one file for example: emails list, links list etc
<?php
$data1 = file("data1.txt");
file_put_contents('unique.txt', implode('\n', array_unique($data1)));
?>
This will remove all duplicates and save it as a file of unique.txt
or
<?php
$data1 = file("data1.txt");
$uniqueArray = array_unique($data1));
?>
Will store it in $uniqueArray
$lines = file("test-file");
foreach($lines as $line)
{
$new[str_replace(array("\n","\r"),"",$line)] = 1;
}
print_r(array_keys($new));
$file = file_get_contents($filename);
$arr = array();
$arr = split('\n',$file);
$arr = array_unique($arr);
Then write contents of $arr to textfile again

Categories