I am trying to delete items inside List A from list B
And the two inside a text file
Example : a.txt
1
3
6
b.txt
2
3
6
I tried more than one method previously but with large files it does not work as it should
$a = file('a.txt', FILE_IGNORE_NEW_LINES);
$b = file('b.txt', FILE_IGNORE_NEW_LINES);
$n = 'new.txt';
for ($i = 0;$i < count($b);$i++)
{
if (!in_array($b[$i], $a))
{
$c = file_get_contents($n);
$c .= $b[$i] . "\n";
file_put_contents($n, $c);
}
}
Is there a better way to handle large files like 80k line?
This code mainly changes the way the files are read and written, so that the second file is read 1 line at a time and not read all in memory. The output also uses FILE_APPEND in file_put_contents() so that it doesn't need to read the file again.
The first part is to create an array of the a.txt file, with the value as the index to allow you to use isset() rather than in_array() which will make the searching a lot quicker.
Then read the second file 1 line at a time, check if it's present and add the data if needed...
$fileA = fopen('a.txt', 'r');
$a = [];
while($entry = fgets($fileA))
{
$a[trim($entry)] = true;
}
$fileB = fopen('b.txt', 'r');
$n = 'new.txt';
// Clear the file
file_put_contents($n, '');
while($b = fgets($fileB))
{
if (!isset($a[trim($b)]))
{
file_put_contents($n, $b, FILE_APPEND);
}
}
Related
I have a csv file with a very large number of item (5000 lines) in this format
storeId,bookId,nb
124,48361,0
124,48363,6
125,48362,8
125,48363,2
126,28933,4
142,55433,6
142,55434,10
171,55871,7
171,55872,6
I need to count the number of stores in the file, so for exemple with the line above the result should be 5. But I need to doo it with 5000 lines so I can't just loop.
How can I achieve that?
I also need too return the max quantity, so 10
I began by converting the file into an array:
if (file_exists($file)) {
$csv = array_map('str_getcsv', file($file));
#Stores
$storeIds = array_column($csv, 0);
$eachStoreNb = array_count_values($storeIds);
$storeCount = count($eachStoreNb);
}
print_r($storeCount);
Is there a better way to do it? Faster ? Maybe without using the array
Faster here would come in the context of micro-optimization, however you can see an improvement in memory usage.
You could just read the file line by line instead of collecting all store IDs in an array and then doing an array_count_values() saving you an extra loop and unnecessary linear storage of all duplicate values.
Store IDs would just be made as a key for an associative array.
For max NB, you can just keep a max variable keeping the track of max value using max() function. Rest is self-explanatory.
Snippet:
<?php
$file = 'test.csv';
if (file_exists($file)) {
$fp = fopen($file ,'r');
$max_nb = 0;
$store_set = [];
fgetcsv($fp); // ignoring headers
while(!feof($fp)){
$row = fgetcsv($fp);
$store_set[$row[0]] = true;
$max_nb = max($max_nb,end($row));
}
fclose($fp);
echo "Num Stores : ",count($store_set),"<br/>";
echo "Max NB : ",$max_nb;
}else{
echo "No such CSV file found.";
}
Note: For profiling, I suggest you to try both scripts using xdebug
What if you looped through the file line by line?
I mean ...
$datas = [];
$handle = fopen("filename.csv", "r");
$flagFirstLine = true;
while(!feof($handle)){
//dont read first line
if($flagFirstLine) continue;
$flagFirstLine = false;
$csvLine = fgetcsv($handle);
$storeID = $csvLine[0];
$datas[] = $storeID;
}
echo "all row: " . count($datas);
echo "\nnum store: " . count(array_unique($datas));
What 'nice_dev' says, but a little more compact.
$fp = fopen('<your_file>', 'r');
fseek($fp, strpos($content, "\n") + 1); // skip first line
$stores = [];
while($row = fgetcsv($fp)) {
$stores[$row[0]] = max([($stores[$row[0]] ?? 0), $row[2]]);
}
Working example.
An answer with awk would be:
awk -F, 'BEGIN {getline}
{ a[$1]++; m=$3>m?$3:m }
END{ for (i in a){ print i, a[i] };
print "Number of stores",length(a), "max:",m}' testfile
getline to skip the first line
increment the element with the value of the first column $1 in array a with one, and keep the max value in m
loop over the array a and print all counts (optional)
print the total 'Number of stores', and the max value.
output:
124 52
125 52
126 26
142 52
171 52
Number of stores 5 max: 10
Solution in AWK, to compare the difference. This includes the count of each store as well. AWK should be able to process millions in less than 1 second. I use the same to filter duplicates from a file.
BEGIN{ # Set some variables initially
FS="," # field separator for INPUT
mymax=0 # init variable mymax
}
NR>1 { # skip the header line, this matches line 2 onwards
mycount[$1]++ # increase associative array at that position
if ($3>mymax){ # compare with max
mymax=$3
}
}
END{ # finally print results
for (i in mycount){
if (length(i)>0){
print "value " i " has " mycount[i]
}
}
print "Maximum value is " mymax
}
I have a file with ~ 10.000 lines inside. I want every time user access my website, it auto pick 10 lines randomly among them.
Code I currently used:
$filelog = 'items.txt';
$random_lines = (file_exists($filelog))? file($filelog) : array();
$random_count = count($random_lines);
$random_file_html = '';
if ($random_count > 10)
{
$random_file_html = '<div><ul>';
for ($i = 0; $i < 10; $i++)
{
$random_number = rand(0, $random_count - 1); // Duplicate are accepted
$random_file_html .= '<li>'.$random_lines[$random_number]."</li>\r\n";
}
$random_file_html .= '</ul>
</div>';
}
When I have < 1000 lines, every things is ok. But now, with 1000 lines. It make my website slow dow significantly.
That I'm thinking to other methods, like:
Divide file to 50 files, select randomly them, then select 10 lines randoms inside the selected file.
-- or --
I knew total lines (items). Make 10 numbers randomly, then read file use
$file = new SplFileObject('items.txt');
$file->seek($ranđom_number);
echo $file->current();
(My server does not support any type of SQL)
Maybe you have other methods that best suit for me. What is best method for my problem? Thank you very much!
The fastest way would be apparently not to pick 10 lines randomly out of a file with ~ 10.000 lines inside on every user's request.
It's impossible to answer more as we know no details of this "XY problem".
If it is possible to adjust the contents of the file then simply pad each of the lines so they have a common length. Then you can access the lines in the file using random access.
$lineLength = 50; // this is the assumed length of each line
$total = filesize($filename);
$numLines = $total/$lineLength;
// get ten random numbers
$fp = fopen($filename, "r");
for ($x = 0; $x < 10; $x++){
fseek($fp, (rand(1, $numLines)-1)*$lineLength, SEEK_SET);
echo fgets($fp, 50);
}
fclose($fp);
try:
$lines = file('YOUR_TXT_FILE.txt');
$rand = array_rand($lines);
echo $lines[$rand];
for 10 of them just put it in a loop:
$lines = file('YOUR_TXT_FILE.txt');
for ($i = 0; $i < 10; $i++) {
$rand = array_rand($lines);
echo $lines[$rand];
}
NOTE: ** the above code does not guarantee that **2 same lines wont be picked. In order to guarantee uniqueness you need to add extra while loop and an array that holds all randomly generated indexes, so next time it generates it and it already exists in an array, generate another one until its not in the array.
The above solution might not be fastest but might fulfill your needs. Since your server does not support any type of SQL, maybe switch to a different server? Because I am wondering how you are storing User Data? Are those stored in files also?
I'm trying to make a function that writes a list of scores for players.
For example:
player_1 100 12 12 10
player_2 39 13 48 29
And when players beat (or do worse) than their previous scores, their score is over-written with the new score.
I've written a function that sort of works, but has multiple problems.
function write($player)
{
global $logfile;
$lines = file($logfile);
foreach($lines as $i => $line)
{
$pieces = explode(" ", $line);
$pieces[0] = trim($pieces[0]);
if( $pieces[0] == $player->name ) //found name
{
trim($lines[$i]);
unset($lines[$i]); //remove the old player data
$lines[$i] = "{$player->name} {$player->lvl} {$player->exp} {$player->mana} \n"; //write the new score
$fp = fopen($logfile,'a');
fwrite($fp,$lines[$i]);
$found = TRUE;
break;
}
}
if(!$found) //record a new player whose score isn't in the file
{
$fp = fopen($logfile,'a');
$newp = "$player->name $player->lvl $player->exp $player->mana \n";
fwrite($fp, $newp);
}
fclose($fp);
}
The file just appends the new score and doesn't overwrite the previous score. Could someone point out my errors?
Try changing:
$fp = fopen($logfile,'w');
into
$fp = fopen($logfile,'a');
in your
if( $pieces[0] == $player->name ) ...
PHP.fopen file open modes ;)
EDIT
You can override your player entry by putting the fwrite() after foreach loop by overriding the whole file with joined lines (this may cause performace issues).
Or
Try to loop line by line using fgets() and then if you will find the right match use fseek() to the previous line and override it ;)
fgets() fseek()
SECOND EDIT
<?php
$find = 'player_1';
$h = fopen('play.txt','r+');
$prev_pos = 0;
while(($line = fgets($h, 4096)) !== false){
$parts = explode(' ', $line);
if($parts[0] == $find) {
fseek($h, $prev_pos);
fwrite($h, "player_222 12 22 411");
break;
}
$prev_pos = ftell($h);
}
fclose($h);
?>
Code sample as requested ;) The idea is to save previous line position and then use it to fseek and override. I'm not sure if the fwrite will work well on all enviroments without PHP_EOL at the end-of-line, but on mine it's fine.
First, let us see the reason why it duplicates the record. $lines is an array in which you are updating the record of the specific player. But after updating the record, you are appending it to the file (using "a" mode) and therefore duplicating the entry of that player.
The idea should be to update that record in the file. And with your logic, the best thing is to rewrite $lines to the file. Since $lines will always contain the updated entry, it makes sense.
Now coming to the logic where you are making an entry for a new player. There is nothing wrong in that logic but it could be improved by appending the new entry to $lines instead of writing to the file.
Here is the updated code. Please note that I've removed lines that weren't needed.
function write($player) {
global $logfile;
$found = FALSE;
$lines = file($logfile);
foreach($lines as $i => $line) {
$pieces = explode(" ", $line);
$pieces[0] = trim($pieces[0]);
if( $pieces[0] == $player->name ) { //found name
$lines[$i] = "{$player->name} {$player->lvl} {$player->exp} {$player->mana} \n"; //write the new score
$found = TRUE;
break;
}
}
if(!$found) { //record a new player whose score isn't in the file
$lines[] = "$player->name $player->lvl $player->exp $player->mana \n";
}
file_put_contents($logfile, $lines);
}
Hope it helps!
Is this code run on a web server with many users accessing at the same time?
If it is, imagine what happens when one user has just opened the file for writing, the file is emptied, and another opens it for reading before the first one has finished writing the data.
A partial solution is write to a temp file and rename the temp as the original when you are done. Rename is atomic, so the users will see either the original file or the new one and not something in between.
But youll still miss some updates. You could lock the file, meaning that when one person is writing another can't read. To do that you would use the flock function: http://php.net/manual/en/function.flock.php
The proper solution is using a real database. Sqlite for example is nice and simple: no external server processes or passwords...
Thank you for taking the time to read this and I will appreciate every single response no mater the quality of content. :)
Using PHP, I'm trying to get the last 15 lines of a text document (.txt) and store that data into a php variable. I understand that this is possible, however when I do get the last 15 lines, is it possible to retain the order? For example:
text document:
A
B
C
When I grab the text document from the last 15 characters, I don't want the echo to end up like:
C
B
A
All assistance is appreciated and I look forward to your replies; thank you. :) If I didn't explain anything clearly and/or you'd like me to explain in more detail, please reply. :)
Thank you.
Try using array_slice, which will return a part of an array. In this case you want it to return the last 15 lines of the array, so:
$filearray = file("filename");
$lastfifteenlines = array_slice($filearray,-15);
If you don't mind loading the entire file into memory:
$lines = array_slice(file('test.txt'), -15);
print_r($lines );
If the file is too large to fit into memory you can use a circular method:
// Read the last $num lines from stream $fp
function read_last_lines($fp, $num)
{
$idx = 0;
$lines = array();
while(($line = fgets($fp)))
{
$lines[$idx] = $line;
$idx = ($idx + 1) % $num;
}
$p1 = array_slice($lines, $idx);
$p2 = array_slice($lines, 0, $idx);
$ordered_lines = array_merge($p1, $p2);
return $ordered_lines;
}
// Open the file and read the last 15 lines
$fp = fopen('test.txt', 'r');
$lines = read_last_lines($fp, 15);
fclose($fp);
// Output array
print_r($lines);
This method will also work if the file has less than 15 lines- returning an array with however many lines are in the file.
You can use fseek with a negative position to seek backwards through the file, counting newlines as you go.
I'm too tired to write up copy/past-able code, but there are some examples in the comments to the manual page for fseek that are very close to what you want.
If the file isn't bigger than available memory you can do this:
$fArray = file("filename");
$len = sizeof($fArray);
for($i=$len -15;$i<$len ;$i++)
{
echo $fArray[$i];
}
If you have a file that is hundreds of megabytes :
$rc = fopen("file","r");
for ($i=0; $line = fgets($file) ;$i++)
{
if ($i%15 == 0)
{
$last15 = array();
}
$last15[] = $line;
}
echo join("\n",$last15);
the longer array solution:
array_slice(explode("\n",file_get_contents($file)),-15);
the shorter array solution:
array_slice(file($file),-15);
This code will open the file, show the total lines, show the header of file and show the last lines of file defined in $limit.
<?php
// open the file in read mode
$file = new SplFileObject('file.csv', 'r');
// get the total lines
$file->seek(PHP_INT_MAX);
$last_line = $file->key();
echo $last_line;
echo "<br>";
// Rewind to first line to get header
$file->rewind();
// Output first line if you need use the header to make something
echo $file->current();
echo "<br>";
// selecting the limit
$limit = 6;
// selecting the last lines using the $limit
$lines = new LimitIterator($file, $last_line - $limit, $last_line);
//print all the last 6 lines array
//print_r(iterator_to_array($lines));
//echo "<br>";
// Loop over whole file to use a single line
foreach ($lines as $line) {
print_r($line);
echo "<br>";
}
I'm trying to process a for loop within a for loop, and just a little wary of the syntax... Will this work? Essentially, I want to run code for every 1,000 records while the count is equal to or less than the $count... Will the syntax below work, or is there a better way?
for($x = 0; $x <= 700000; $x++) {
for($i = 0; $i <= 1000; $i++) {
//run the code
}
}
The syntax you have will work, but I don't think it's going to do exactly what you want. Right now, it's going to do the outer loop 700,001 times, and for every single one of those 700,001 times, it's going to do the inner loop.
That means, in total, the inner loop is going to run 700,001 x 1001 = about 700.7 million times.
If this isn't what you want, can you give a bit more information? I can't really work out what "I want to run code for every 1,000 records while the count is equal to or less than the $count" means. I don't see any variable named $count at all.
Well, essentially, I'm reading in a text file and inserting each of the lines into a db. I did originally try while(!feof($f)) [where $f = filename], but it keeps complaining of a broken pipe. I thought this would be another way to go
$f should be file-handle returned by fopen(), not a filename.
$file_handle = fopen($filename, 'r');
while(!feof($file_handle)) {
$line = fgets($file_handle);
$line = trim($line); // remove space chars at beginning and end
if(!$line) continue; // we don't need empty lines
mysql_query('INSERT INTO table (column) '
.'VALUES ("'.mysql_real_escape_string($line).'")');
}
Read through the documentation at php.net for fopen(), fgets(). You might also need explode() if you need to split your string.
If your file isn't big, you might want to read it into an array at once like this:
$filelines = file($filename, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
foreach($filelines as $line) {
do_stuff_with($line);
}
Hmm. Ok... Well, essentially, I'm reading in a text file and inserting each of the lines into a db. I did originally try while(!feof($f)) [where $f = filename], but it keeps complaining of a broken pipe. I thought this would be another way to go..
To read a text file line by line I usually:
$file = file("path to file")
foreach($file as $line){
//insert $line into db
}
Strictly answering the question, you'd want something more like this:
// $x would be 0, then 1000, then 2000, then 3000
for($x = 0; $x < 700000; $x += 1000) {
// $i would be $x through $x + 999
for($i = $x; $i < $x + 1000; $i++) {
//run the code
}
}
However, you should really consider one of the other methods for importing files to a database.