Diff of two csv. Memory usage to high

Diff of two csv. Memory usage to high - php

you helped me so much time but now to this Problem i didn´t find a solution yet.
I have two csv which i had to compare and get the differences.
Both csv looks like this:
https://stackoverflow.com
https://google.com
Both files are about 10 MB
Till now i make this:
array1 = array_map('str_getcsv', file(file1));
array2 = array_map('str_getcsv', file(file2));
$diff = array_diff(array_map('serialize',$array1), array_map('serialize',$array2 ));
it works very nice so long as i have unlimited memory.
And thats the problem;-) i don´t have unlimited memory because the server is not the same as befor.
So now the question is:
How can i reduce the memory_usage of it or how can i compare two files.
Please don´t think of filesize or so what.
I need the real differences of the file.
Like in one file it stands
https://stackoverflow.com
and in the other
https://google.com
so the difference is both:-)
thanks for your help guys

Read file1 into the keys of an associative array. Then read file2 line by line, removing those entries from the array.
$file1 = array_flip(file("file1.csv", FILE_IGNORE_NEW_LINES));
$fd2 = fopen("file2.csv");
$diff = array();
while ($line = fgets($fd2)) {
$line = str_replace("\n", "", $line); // remove trailing newline
if (!array_key_exists($line, $file1)) {
// line is only in file2, add it to result
$diff[] = $line;
} else {
// line is in both files, remove it from $file1
unset($file1[$line]);
}
}
fclose($fd2);
// Remaining keys in $file1 are unique to that file
$diff += array_keys($file1);
If reading the first file into an array and then flipping it is too much memory, you could do that with an fgets() loop as well (but the garbage collector should clean up the temporary array created by file()).

Related

PHP Script - Comment/Uncomment line

Could anyone give me some ideas or solution for toggling comments in file?
To read value and toggle comment/uncomment in that line where value resides.
For example, I would like to include class Model and initialize it.
In some file there are prepared includes and initializations:
//include('foo/Model.php');
//$model = new Model();
Function is needed, for those who can not understand what the question is.
How to uncomment?

Thanks for adding more insights to your question! Actually it's a pretty interesting one.
As far as I understand you're looking for a dynamic way to comment/uncomment lines inside a file.
So let's define our parameters first:
We want to manipulate a specific file (we need the filename)
We want to toggle specific line numbers inside this file (list of line numbers)
function file_toggler(string $file, array $lineNumbers)
With this in mind I we need to read a file and split their lines into line numbers. PHP provides a handy function for this called file().
file(): Returns the file in an array. Each element of the array corresponds to a line in the file, with the newline still attached.
With this in mind we have everything what we need to write a function:
<?php
function file_toggler(string $file, array $lineNumbers)
{
// normalize because file() starts with 0 and
// a regular user would use (1) as the first number
$lineNumbers = array_map(function ($number) {
return $number - 1;
}, $lineNumbers);
// get all lines and trim them because
// file() keeps newlines inside each line
$lines = array_map('trim', file($file));
// now we can take the line numbers and
// check if it starts with a comment.
foreach ($lineNumbers as $lineNumber) {
$line = trim($lines[$lineNumber]);
if (substr($line, 0, 2) == '//') {
$line = substr($line, 2);
} else {
$line = '//' . $line;
}
// replace the lines with the toggled value
$lines[$lineNumber] = $line;
}
// finally we write the lines to the file
// I'm using implode() which will append a "\n" to every
// entry inside the array and return it as a string.
file_put_contents($file, implode(PHP_EOL, $lines));
}
toggleFileComments(__DIR__ . '/file1.php', [3, 4]);
Hope it helps :-)

How can I remove duplicated lines in a file using PHP (including the "original' one)?

Well, my question is very simple, but I didn't find the proper answer in nowhere. What I need is to find a way that reads a .txt file, and if there's a duplicated line, remove ALL of them, not preserving one. For example, in a .txt contains the following:
1234
1233
1232
1234
The output should be:
1233
1232
Because the code has to delete the duplicated line, all of them. I searched all the web, but it always point to answers that removes duplicated lines but preserve one of them, like this, this or that.
I'm afraid that the only way to do this is to read the x line and check the whole .txt, if it finds an equal result, delete, and delete the x line too. If not, change to the next line. But the .txt file I'm checking has 50 milions lines (~900Mb), I don't know how much memory I need to do this kind of task, so I appreciate some help here.

Read the file line by line, and use the line contents as the key of an associative array whose values are a count of the number of times the line appears. After you're done, write out all the lines whose value is only 1. This will require as much memory as all the unique lines.
$lines = array();
$fd = fopen("inputfile.txdt", "r");
while ($line = fgets($fd)) {
$line = rtrim($line, "\r\n"); // ignore the newline
if (array_key_exists($line, $lines)) {
$lines[$line]++;
} else {
$lines[$line] = 1;
}
}
fclose($fd);
$fd = fopen("outputfile.txt", "w");
foreach ($lines as $line => $count) {
if ($count == 1) {
fputs($fd, "$line" . PHP_EOL); // add the newlines back
}
}

I doubt there is one and only one function that does all of what you want to do. So, this breaks it down into steps...
First, can we load a file directly into an array? See the documentation for the file command
$lines = file('mytextfile.txt');
Now, I have all of the lines in an array. I want to count how many of each entry I have. See the documentation for the array_count_values command.
$counts = array_count_values($lines);
Now, I can easily loop through the array and delete any entries where the count>1
foreach($counts as $value=>$cnt)
if($cnt>1)
unset($counts[$value]);
Now, I can turn the array keys (which are the values) into an array.
$nondupes = array_keys($counts);
Finally, I can write the contents out to a file.
file_put_contents('myoutputfile.txt', $nondupes);

I think I have a solution far more elegant:
$array = array('1', '1', '2', '2', '3', '4'); // array with some unique values, some not unique
$array_count_result = array_count_values($array); // count values occurences
$result = array_keys(array_filter($array_count_result, function ($value) { return ($value == 1); })); // filter and isolate only unique values
print_r($result);
gives:
Array
(
[0] => 3
[1] => 4
)

Output only csv lines if column increment is 2

I have a csv file that I would like to filter. The output I need would be only to output the lines if the increment is not equal to 2. In the csv file below, I would like to compare the first line with the second line, if the increment is 2, check line 3 vs line 2, and so on. If the increment is not equal to 2, output the line. I'm looking at the 3rd cloumn values
L1,is,2.0,mins,LATE,for,Arrive,at,shop,18:07:46
L1,is,4.0,mins,LATE,for,Arrive,at,shop,18:09:46
L1,is,6.0,mins,LATE,for,Arrive,at,shop,18:11:46
L1,is,8.0,mins,LATE,for,Arrive,at,shop,18:13:46
L1,is,10.0,mins,LATE,for,Arrive,at,shop,18:15:46
L1,is,2.0,mins,LATE,for,Arrive,at,shop,18:19:49
L1,is,4.0,mins,LATE,for,Arrive,at,shop,18:21:49
L1,is,6.0,mins,LATE,for,Arrive,at,shop,18:23:49
L1,is,8.0,mins,LATE,for,Arrive,at,shop,18:25:49
L1,is,10.0,mins,LATE,for,Arrive,at,shop,18:27:49
L1,is,16.2,mins,LATE,for,Arrive,at,shop,18:34:02
L1,is,18.2,mins,LATE,for,Arrive,at,shop,18:36:02
L1,is,20.2,mins,LATE,for,Arrive,at,shop,18:38:02
L1,is,2.0,mins,LATE,for,Arrive,at,bridge,21:45:26
L1,is,4.0,mins,LATE,for,Arrive,at,bridge,21:47:26
L1,is,6.0,mins,LATE,for,Arrive,at,bridge,21:49:26
So only lines 5,10,13 and 16 would output to page.
I'm stuck on this and would appreciate any help or direction on where to look.
Thanks

If your file is not too big, you can load it into memory directly, like this:
$data = array_map(function($row)
{
return explode(',', $row);
}, file('/path/to/file.csv', FILE_IGNORE_NEW_LINES));
$result = [];
$increment = 2;
$delta = 1E-13;
for($i=1; $i<count($data); $i++)
{
if(abs($data[$i][2]-$data[$i-1][2]-$increment)>$delta)
{
$result[$i] = $data[$i];
}
}
-since your column holds floats, safe comparison on equality will be using precision delta.
Your data will be gathered in $result array, so you can output it like
foreach($result as $row)
{
echo(join(',', $row).PHP_EOL);
}
-or, else, do not store rows inside $result array (if you will need them no longer) and use first cycle to output your rows.
Edit:
Sample above will work in PHP>=5.4 For PHP 5.3 you should replace array definition to
$result = array();
and if you have even older PHP version, like 5.2, then callback inside array_map() should be rewritten with using create_function()

Re-arranging lines in a file

I'm trying to read data from a text file and assign it to arrays. How could I read exactly 3 lines at a time, and then assign the first line to array $a, second line to array $b, third line to array $c? and then read exactly 3 more lines, etc.

$lines = file('some_file.txt');
$numLines = count($lines);
for ($i = 0; $i < $numLines; $i += 3) {
$a[] = $lines[$i];
$b[] = $lines[$i + 1];
$c[] = $lines[$i + 2];
}
Note that you'll want to do some out-of-bounds index error checking, as well. I leave that as an exercise for the OP.

The example for fgets should give you some ideas:
http://php.net/manual/en/function.fgets.php#refsect1-function.fgets-examples

you can use something like this for example:
$lines = file('filename');
$chunks_array = array_chunk($lines, 3)); - this create array of arrays with 3 lines each
foreach ($chunks_array as $chunks)
{
$a[] = $chunks[0];
$b[] = $chunks[1];
$c[] = $chunks[2];
}

Once I had a similar problem. I solve like this (in pseudocode).
counter = 1;
while reading
switch counter
case 1: store in the first array then break;
case 2: store in the second array then break;
case 3: store in the third array, counter = 0, then break;
counter++;
end-while

You could use fseek, or file_get_contents with maxlen parameter. But to read exactly 3 lines, I don't actually know, unless you know how long are these lines.
function file reads all lines into an array.
Edit two:
Could read the file byte by byte (although a bad idea from my point of view) and stop after you encounter each \n or PHP_EOL and use a counter or whatever to manage how it is used.
Edit one:
I just got this idea: you could create a custom stream wrapper, and handle your reading the lines 3 by 3 with it. It is a great tool for files, check http://www.php.net/manual/en/class.streamwrapper.php , and control it with through context or variables, or what ever.
I guess you still will have to find an algorithm for this. I didn't tried this yet, but let us know if you handle it.

PHP Array Generator

I have some values in a excel file and I want all of them to be array element remember file also have other data in it.
I know one way is that copy them one by one and put into array initialize statment
A sample list which is just a part of whole list
Bay Area
Greater Boston
Greater Chicago
Greater Dallas-Ft. Worth
Greater D.C.
Las Vegas
Greater Houston
Greater LA
Greater New York
Greater San Diego
Seattle
South Florida
It is easy to initialize array with values when there are not much items like
$array= array('Bay Area '=>'Bay Area ','Greater Boston'=>'Greater Boston',....)
// and so on
But I have 70-80 of items it is very tedious task to initialize array like above.
So, Guys Is there any alternate or short cut to assign array with the list of values?
Is there any auto array generator tool?

If you copied them to a file with each one its own line you could read the file in php like this
$myArray = file('myTextFile');
//sets the keys to the array equal to the values
$myArray = array_combine($myArray, $myArray);
You could export the data from excel to a CSV, then read the file into php like so:
$myArray = array();
if (($file = fopen("myTextFile", "r")) !== FALSE) {
while (($data = fgetcsv($file)) !== FALSE) {
foreach($data as $value) {
$myArray[$value] = $value;
}
}
fclose($handle);
}

$array = explode("\n", file_get_contents('yourfile.txt'));
For more complex cases for loading CSV files in PHP, maybe use fgetcsv() or even PHPExcelReader for XLS files.
EDIT (after question edit)
(Removed my poor solution, as ohmusama's file() + array_combine() is clearly nicer)

This one:
$string_var = "
Bay Area
Greater Boston
Greater Chicago
Greater Dallas-Ft. Worth
";
$array_var = explode("\n", $string_var);

get notepad++, open the excel file there, and do a simple search and replace with regex. something like search for "(.*)\n" and replace with "'(\1)'," (" quoutes not included), this would give you a long list of:
'Bay Area','Greater Boston','Greater Chicago'
This would be the fastest way of creating the array in terms of php execution time.

I think it's looks better:
$a[] = "Bay Area";
$a[] = "Greater Boston";
$a[] = "Greater Chicago";
For creating such text file, use Excel (I don't have Excel, but it looks somewhat):
=concatenate(" $a[] = ",chr(34),A1,chr(34),";")
Then export only that column.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Diff of two csv. Memory usage to high - php

Related

PHP Script - Comment/Uncomment line

How can I remove duplicated lines in a file using PHP (including the "original' one)?

Output only csv lines if column increment is 2

Re-arranging lines in a file

PHP Array Generator

Categories

Resources