Replace duplicates in a text file with PHP? - php

I have text files that have list of thousands of names like this
DallasWebJobs
DallasWebJobs
DallasWebJobs
php_gigs
brotherjudkins
goldbergwb
SanDiegoWebJobs
brinteractive
muracms
browan85
php_gigs
php_gigs
php_gigs
php_gigs
1 name per line, 1 file may have up to 30,000 names on it though and I need to replace all duplicate names because probably as many as half are duplicates.
I would like to do this in PHP, 1 though was importing each line into a MySQL database and then doing it but that seems like overkill, i'm sure there is an easier way.
Please help if you can
Update I found this for emails, it should work too
$list = file('./Emailist.txt');
$list_unique = array_unique($list);
foreach ($list_unique as $mail) {
echo $mail;
}

From php.net: serg dot podtynnyi at gmail dot com 06-Feb-2009 11:21
//Remove duplicates from a text files and dump result in one file for example: emails list, links list etc
<?php
$data1 = file("data1.txt");
file_put_contents('unique.txt', implode('\n', array_unique($data1)));
?>
This will remove all duplicates and save it as a file of unique.txt
or
<?php
$data1 = file("data1.txt");
$uniqueArray = array_unique($data1));
?>
Will store it in $uniqueArray

$lines = file("test-file");
foreach($lines as $line)
{
$new[str_replace(array("\n","\r"),"",$line)] = 1;
}
print_r(array_keys($new));

$file = file_get_contents($filename);
$arr = array();
$arr = split('\n',$file);
$arr = array_unique($arr);
Then write contents of $arr to textfile again

Related

How to change all values in a column of a csv file to a specific value php

I have a csv file that looks something like this (there are many more rows):
Jim,jim#email.com,8882,456
Bob,bob#email.com,8882,343
What I want to do is to change all the values in the fourth column,456,343 to 500.
I'm new to php and am not sure how to do this.
I have tried
<?php
$file = fopen('myfile.csv', 'r+');
$toBoot = array();
while ($data = fgetcsv($file)) {
echo $data[3];
$data[3] = str_replace($data[3],'500');
array_push($toBoot, $data);
}
//print_r($toBoot);
echo $toBoot[0][3];
fputcsv($file, $toBoot);
fclose($file)
?>
But it prints
Jim,jim#email.com,8882,456
Bob,bob#email.com,8882,343
Array,Array
not
Jim,jim#email.com,8882,500
Bob,bob#email.com,8882,500
I've looked at this post, PHP replace data only in one column of csv but it doesn't seem to work.
Any help appreciated. Thanks
You can use preg_replace and replace all values at once and not loop each line of the CSV file.
Two lines of code is all that is needed.
$csv = file_get_contents($path);
file_put_contents($path, preg_replace("/(.*),\d+/", "$1,500", $csv));
Where $path is the path and to the CSV file.
You can see it in action here: https://3v4l.org/Mc3Pm
A quick and dirty way to way to solve your problem would be:
foreach (file("old_file.csv") as $line)
{
$new_line = preg_replace('/^(.*),[\d]+/', "$1,500", $line);
file_put_contents("new_file.csv", $new_line, FILE_APPEND);
}
To change one field of the CSV, just assign to that array element, you don't need to use any kind of replace function.
$data[3] = "500";
fputcsv() is used to write one line to a CSV file, not the entire file at once. You need to call it in a loop. You also need to go back to the beginning of the file and remove the old contents.
fseek($file, 0);
ftruncate($file, 0);
foreach ($toBoot as $row) {
fputcsv($file, $row);
}

How to filter on a word in a specific column of a csv file with PHP

I'm trying to display only the rows that contain a specific word in a specific column. Basically I would like to show only the rows that have "yes" in the Display column.
First_Name, Last_Name, Display
Kevin, Smith, yes
Jack, White, yes
Joe, Schmo, no
I've been trying various things with fgetcsv & str_getcsv from other answers and from php.net but nothing is working so far.
It doesn't do anything but this is my current code:
$csv = fopen('file.csv', 'r');
$array = fgetcsv($csv);
foreach ($array as $result) {
if ($array[2] == "yes") {
print ($result);
}
}
Let's have a look at the documentation for fgetcsv():
Gets line from file pointer and parse for CSV fields
fgetcsv reads a single line, not the whole file. You can keep reading lines until you reach the end of the file by putting it in a while loop, e.g.
<?php
$csv = fopen('file.csv', 'r');
// Keep looping as long as we get a new $row
while ($row = fgetcsv($csv)) {
if ($row[2] == "yes") {
// We can't just echo $row because it's an array
//
// Instead, let's join the fields with a comma
echo implode(',', $row);
echo "\n";
}
}
// Don't forget to close the file!
fclose($csv);
You should use data tables.
https://datatables.net/examples/basic_init/zero_configuration.html
That's how I deal with my textfiles. But be carefull, with a large amount of Data (> 10000 rows) you should have a loog at the deferRender option.
https://datatables.net/reference/option/deferRender <-- JSON DATA required.

PHP Array sorting within WHILE loop

I have a huge issue, I cant find any way to sort array entries. My code:
<?php
error_reporting(0);
$lines=array();
$fp=fopen('file.txt, 'r');
$i=0;
while (!feof($fp))
{
$line=fgets($fp);
$line=trim($line);
$lines[]=$line;
$oneline = explode("|", $line);
if($i>30){
$fz=fopen('users.txt', 'r');
while (!feof($fz))
{
$linez=fgets($fz);
$linez=trim($linez);
$lineza[]=$linez;
$onematch = explode(",", $linez);
if (strpos($oneline[1], $onematch[1])){
echo $onematch[0],$oneline[4],'<br>';
}
else{
}
rewind($onematch);
}
}
$i++;
}
fclose($fp);
?>
The thing is, I want to sort items that are being echo'ed by $oneline[4]. I tried several other posts from stackoverflow - But was not been able to find a solution.
The anser to your question is that in order to sort $oneline[4], which seems to contain a string value, you need to apply the following steps:
split the string into an array ($oneline[4] = explode(',',
$oneline[4]))
sort the resulting array (sort($oneline[4]))
combine the array into a string ($oneline[4] = implode(',',
$oneline[4]))
As I got the impression variable naming is low on the list of priorities I'm re-using the $oneline[4] variable. Mostly to clarify which part of the code I am referring to.
That being said, there are other improvements you should be making, if you want to be on speaking terms with your future self (in case you need to work on this code in a couple of months)
Choose a single coding style and stick to it, the original code looked like it was copy/pasted from at least 4 different sources (mostly inconsistent quote-marks and curly braces)
Try to limit repeating costly operations, such as opening files whenever you can (to be fair, the agents.data could contain 31 lines and the users.txt would be opened only once resulting in me looking like a fool)
I have updated your code sample to try to show what I mean by the points above.
<?php
error_reporting(0);
$lines = array();
$users = false;
$fp = fopen('http://20.19.202.221/exports/agents.data', 'r');
while ($fp && !feof($fp)) {
$line = trim(fgets($fp));
$lines[] = $line;
$oneline = explode('|', $line);
// if we have $users (starts as false, is turned into an array
// inside this if-block) or if we have collected 30 or more
// lines (this condition is only checked while $users = false)
if ($users || count($lines) > 30) {
// your code sample implies the users.txt to be small enough
// to process several times consider using some form of
// caching like this
if (!$users) {
// always initialize what you intend to use
$users = [];
$fz = fopen('users.txt', 'r');
while ($fz && !feof($fz)) {
$users[] = explode(',', trim(fgets($fz)));
}
// always close whatever you open.
fclose($fz);
}
// walk through $users, which contains the exploded contents
// of each line in users.txt
foreach ($users as $onematch) {
if (strpos($oneline[1], $onematch[1])) {
// now, the actual question: how to sort $oneline[4]
// as the requested example was not available at the
// time of writing, I assume
// it to be a string like: 'b,d,c,a'
// first, explode it into an array
$oneline[4] = explode(',', $oneline[4]);
// now sort it using the sort function of your liking
sort($oneline[4]);
// and implode the sorted array back into a string
$oneline[4] = implode(',', $oneline[4]);
echo $onematch[0], $oneline[4], '<br>';
}
}
}
}
fclose($fp);
I hope this doesn't offend you too much, just trying to help and not just providing the solution to the question at hand.

How do I get only unique values from CSV file array

I am building a small application that does some simple reporting based on CSV files, the CSV files are in the following format:
DATE+TIME,CLIENTNAME1,HAS REQUEST BLABLA1,UNIQUE ID
DATE+TIME,CLIENTNAME2,HAS REQUEST BLABLA2,UNIQUE ID
DATE+TIME,CLIENTNAME1,HAS REQUEST BLABLA1,UNIQUE ID
DATE+TIME,CLIENTNAME2,HAS REQUEST BLABLA2,UNIQUE ID
Now I am processing this using the following function:
function GetClientNames(){
$file = "backend/AllAlarms.csv";
$lines = file($file);
arsort($lines);
foreach ($lines as $line_num => $line) {
$line_as_array = explode(",", $line);
echo '<li><i class="icon-pencil"></i>' . $line_as_array[1] . '</li>';
}
}
I am trying to retrieve only the Clientname values, but I only want the unique values.
I have tried to create several different manners of approaching this, I understand I need to use the unique_array function, but I have no clue on exactly how to use this function.
I've tried this:
function GetClientNames(){
$file = "backend/AllAlarms.csv";
$lines = file($file);
arsort($lines);
foreach ($lines as $line_num => $line) {
$line_as_array = explode(",", $line);
$line_as_array[1] = unique_array($line_as_array[1]);
echo '<li><i class="icon-pencil"></i>' . $line_as_array[1] . '</li>';
}
}
But this gives me a very very dirty result with 100's of spaces instead of the correct data.
I would recommend you to use the fgetcsv() function when reading in csv files. In the wild csv files can be quite complicated handle by naive explode() approach:
// this array will hold the results
$unique_ids = array();
// open the csv file for reading
$fd = fopen('t.csv', 'r');
// read the rows of the csv file, every row returned as an array
while ($row = fgetcsv($fd)) {
// change the 3 to the column you want
// using the keys of arrays to make final values unique since php
// arrays cant contain duplicate keys
$unique_ids[$row[3]] = true;
}
var_dump(array_keys($unique_ids));
You can also collect values and use array_unique() on them later. You probably want to split the "reading in" and the "writing out" part of your code too.
Try using array_unique()
Docs:
http://php.net/manual/en/function.array-unique.php

How Can I Remove Duplicate Rows from CSV file with PHP

I have CSV file that looks like this:
account, name, email,
123, John, dsfs#email.com
123, John, dsfs#email.com
1234, Alex, ala#email.com
I need to remove duplicate rows.I try to do it like this:
$inputHandle = fopen($inputfile, "r");
$csv = fgetcsv($inputHandle, 1000, ",");
$accounts_unique = array();
$accounts_unique = array_unique($csv);
print("<pre>".print_r($accounts_unique, true)."</pre>");
But I get in print_r only first headers row.
What needs to be done in order to make sure I
1. I clean the CSV file from duplicate rows
2. I can make some list of those duplicates (maybe store them in another CSV?)
Simple solution, but it requires a lot of memory if file is really big.
$lines = file('csv.csv');
$lines = array_unique($lines);
file_put_contents(implode(PHP_EOL, $lines));
I would go this route, which will be faster than array_unique:
$inputHandle = fopen($inputfile, "r");
$csv = trim(fgetcsv($inputHandle, 1000, ","));
$data = array_flip(array_flip($csv)); //removes duplicates that are the same
$dropped = array_diff_key($csv, $data); //Get removed items.
Note -- array_unique() and array_flip(array_flip()) will only match for duplicate lines that are exactly the same.
Updated to include information from my comments.
If you are going to loop the data from the CSV anyway I think it would be best to do something like this.
$dataset = array();
foreach($line as $data){
$dataset[sha1($data)] = $data;
}

Categories