I have a list of 50,000 ID's in a flat file and need to remove any duplicate ID's. Is there any efficient/recommended algorithm for my problem?
Thanks.
You can use the command line sort program to order and filter the list of ids. This is a very efficient program and scales well too.
sort -u ids.txt > filteredIds.txt
Read into a dictionary line by line, discarding duplicates. When all read, write out to a new file.
I've did some experiments once and the fastest solution I could get in PHP was by sorting the items and manually remove all the duplicate items.
If performance isn't that much of an issue for you (which I suspect, 50,000 is not that much) than you can use array_unique(): http://php.net/array_unique
i guess if you have large enough memory allowance, you can put all these ids in array
$array[$id] = $id;
this would automatically weed out the dupes.
You can do:
file_put_contents($file,implode("\n",array_unique(file($file)));
How it works?
Read the file using function file
which returns an array.
Get rid of the duplicate lines using
array_unique
implode those unique lines with "\n"
to get a string
write the string back to the file
using file_put_contents
This solution assumes that you've got one ID per line in the flat file.
You can do it via array / array_unique, in this example i guess your ids are separated by line braks, if thats not the case just change it
$file = file_get_contents('/path/to/file.txt');
$array = explode("\n",$file);
$array = array_unique($array);
$file = implode("\n",$array);
file_put_contents('/path/to/file.txt',$file);
If you can just explode the contents of the file on a comma (or any delimiter), then array_unique will produce the least (and cleanest) code, otherwise if your are parsing the file going with the $array[$id] = $id is the fastest and cleanest solution.
If you can use a terminal (or native unix execution), the easiest way: (assuming that there is nothing else in the file):
sort < ids.txt | uniq > filteredIds.txt
Related
I need to read a csv file and put all of the data in a particular column into a Set ( so as to eliminate duplicates ). I have earched through the questions here and I see some similar answers for arrays but I am not well enough versed in php to extrapolate from these examples to my use case.
Specifically, I need to grab a csv file of all the cars towed by the city of Chicago. Then I need to loop through the 'Make' column and grab all of the makes. I do not want to include duplicates so I'm thinking a Set ( which I understand php7 supports) would be an appropriate data structure.
I'm not concerned at all with efficiency at this point, more with simplicity as this is part of a homework assignment. I had the option of just manually going through and picking out the makes but thought that would be an easy out.
I guess array is ok since someone was kind enough to inform me about these two functions.
[code]
$towFileString = file_get_contents("put link here");
file_put_contents('tow-data.csv', $towFileString);
$file = fopen('tow-data.csv', 'rb');
$csvArray = array();
while(!feof($file)){
$csvArray = fgetcsv($file);
}
$allMakes = array_column($csvArray, 'Make');
$uniqueMakes = array_unique($allMakes);
When I run the program, however, PHP tells me this:
Warning: array_column() expects parameter 1 to be array, boolean given
So it is telling me that the array I just created is really a boolean.
I was hoping you could give me some insight into why this is the case
NOTE: I am not asking you to do my work for me. I'd just like a few pointers to get me started in the right direction.
You can put the values of the 'make' column into a seperate array using the array_column function, like this:
$allMakes = array_column($towedCars, 'make');
and then run array_unique on it, which will eliminate all the duplicates:
$uniqueMakes = array_unique($allMakes);
I have a long string of values separated by tabs and I wish to cut the data as if using unix cut -f. If I use cut -f5 it cuts all my data into a single column of the value which is in the 5th position. Is there a PHP function that can do the same?
Below is an example of the raw file with each word in the row separated by a tab
The result would be as follows if I ran cut -f2:
I guess the answer really is "no", as far as I know. But you can combine a few PHP functions to achieve the same result.
You can use file to read the lines from the file into an array
$rows = file($path_to_your_file);
Then convert that array of strings to a multidimensional array
$rows = array_map(function($row){
return str_getcsv($row, "\t");
}, $rows);
Then get the column you want from that array.
$column_5 = array_column($rows, 4);
Not as concise as cut -f5, but PHP rarely is for things like this.
Incidentally, if you don't care that your PHP program will only work on systems that have unix cut, you can actually just use the cut -f in shell_exec.
It seems like a pretty simple issue and one that I thought for sure would have been asked before but I've been searching for some time now and have not found a solution.
When using PHP's file() command, it reads the file and puts each line in an array as a String. Is there any way to read each line as an integer?
(I know that I could just through the array and convert it to an int, but I figure that will slow it down somewhat)
Nope. The only way is to do it manually. Fortunately, that's easy:
$lines = array_map('intval', file($path));
Let's say I have text file Data.txt with:
26||jim||1990
31||Tanya||1942
19||Bruce||1612
8||Jim||1994
12||Brian||1988
56||Susan||2201
and it keeps going.
It has many different names in column 2.
Please tell me, how do I get the count of unique names, and how many times each name appears in the file using PHP?
I have tried:
$counts = array_count_values($item[1]);
echo $counts;
after exploding ||, but it does not work.
The result should be like:
jim-2,
tanya-1,
and so on.
Thanks for any help...
Read in each line, explode using the delimiter (in this case ||), and add it to an array if it does not already exist. If it does, increment the count.
I won't write the code for you, but here a few pointers:
fread reads in a line
explode will split the line based on a delimiter
use in_array to check if the name has been found before, and to determine whether you need to add the name to the array or just increment the count.
Edit:
Following Jon's advice, you can make it even easier for you.
Read in line-by-line, explode by delimiter and dump all the names into an array (don't worry about checking if it already exists). After you're done, use array_count_values to get every unique name and its frequency.
Here's my take on this:
Use file to read the data file, producing an array where each element corresponds to a line in the input.
Use array_filter with trim as the filter function to remove blank lines from this array. This takes advantage that trim returns a string having removed whitespace from both ends of its argument, leaving the empty string if the argument was all whitespace to begin with. The empty string converts to boolean false -- thus making array_filter disregard lines that are all whitespace.
Use array_map with a callback that involves calling explode to split each array element (line of text) into three parts and returning the second of these. This will produce an array where each element is just a name.
Use array_map again with strtoupper as the callback to convert all names to uppercase so that "jim" and "JIM" will count as the same in the next step.
Finally, use array_count_values to get the count of occurrences for each name.
Code, taking things slowly:
function extract_name($line) {
// The -1 parameter (available as of PHP 5.1.0) makes explode return all elements
// but the last one. We want to do this so that the element we are interested in
// (the second) is actually the last in the returned array, enabling us to pull it
// out with end(). This might seem strange here, but see below.
$parts = explode('||', $line, -1);
return end($parts);
}
$lines = file('data.txt'); // #1
$lines = array_filter($lines, 'trim'); // #2
$names = array_map('extract_name', $lines); // #3
$names = array_map('strtoupper', $names); // #4
$counts = array_count_values($names); // #5
print_r($counts); // to see the results
There is a reason I chose to do this in steps where each steps involves a function call on the result of the previous step -- that it's actually possible to do it in just one line:
$counts = array_count_values(
array_map(function($line){return strtoupper(end(explode('||', $line, -1)));},
array_filter(file('data.txt'), 'trim')));
print_r($counts);
See it in action.
I should mention that this might not be the "best" way to solve the problem in the sense that if your input file is huge (in the ballpark of a few million lines) this approach will consume a lot of memory because it's reading all the input in memory at once. However, it's certainly convenient and unless you know that the input is going to be that large there's no point in making life harder.
Note: Senior-level PHP developers might have noticed that I 'm violating strict standards here by feeding the result of explode to a function that accepts its argument by reference. That's valid criticism, but in my defense I am trying to keep the code as short as possible. In production it would be indeed better to use $a = explode(...); return $a[1]; although there will be no difference as regards the result.
While I do feel that this website's purpose is to answer questions and not do homework assignments, I don't acknowledge the assumption that you are doing your homework, since that fact has not been provided. I personally learned how to program by example. We all learn our own ways, so here is what I would do if I were to attempt to answer your question as accurately as possible, based on the information you have provided.
<?php
$unique_name_count = 0;
$names = array();
$filename = 'Data.txt';
$pointer = fopen($filename,'r');
$contents = fread($pointer,filesize($filename));
fclose($pointer);
$lines = explode("\n",$contents);
foreach($lines as $line)
{
$split_str = explode('|',$line);
if(isset($split_str[2]))
{
$name = strtolower($split_str[2]);
if(!in_array($name,$names))
{
$names[] = $name;
$unique_name_count++;
}
}
}
echo $unique_name_count.' unique name'.(count($unique_name_count) == 1 ? '' : 's').' found in '.$filename."\n";
?>
I have file txt and in the file I have phone numbers.
I want to filter out that the duplicate numbers. How I could do it using PHP?
Each number is a new line /r/n
You could:
Parse the string into an array, via explode
Filter out the dups, via array_unique
$numbers = Array();
$numbers = file('mydata.txt');
$numbers = array_unique($numbers);
This should do:
$numbers = array_unique(file('phones.txt'));
print_r($numbers);
Used functions file() and array_unique().
Good luck!
Further explanation.
The file() will:
Returns the file in an array. Each
element of the array corresponds to a
line in the file...
So you can use to your advantage that the phones are one on each line.
Note:
Just in case I may clarify that this won't work if the .txt file actually has /r/n
123/r/n
456/r/n
123/r/n
789/r/n
More:
You can find this function file_get_contents() useful but it turns everything into a string NOT an array.
file(), reads in to array
array_unique() remove the duplicates
implode() recreate the per line format
file_put_contents() write the file back