Splitting up a large text document into multiple smaller text files

Splitting up a large text document into multiple smaller text files - php

I am developing a text collection engine using fwrite() to write text but I want to put a file size cap of 1.5 mb on the writing process so if the file is larger that 1.5mb it will start writing a new file from where it left off and so on until it writes the contents of the source file into multiple files. I have Google-searched but many of the tutorials and examples are too complex for me because I am a novice programmer. The code below is inside a for loop which fetches the text ($RemoveTwo). It does not work as I need. Any help would be appreciated.
switch ($FileSizeCounter) {
case ($FileSizeCounter> 1500000):
$myFile2 = 'C:\TextCollector/'.'FilenameA'.'.txt';
$fh2 = fopen($myFile2, 'a') or die("can't open file");
fwrite($fh2, $RemoveTwo);
fclose($fh2);
break;
case ($FileSizeCounter> 3000000):
$myFile3 = 'C:\TextCollector/'.'FilenameB'.'.txt';
$fh3 = fopen($myFile3, 'a') or die("can't open file");
fwrite($fh3, $RemoveTwo);
fclose($fh3);
break;
default:
echo "continue and continue until it stops by the user";
}

Try doing something like this. You need to read from a source then write piece by piece all the while checking for end of file from the source. When you compare the max and buffer values, if they are true, then close the current file and open a new one with an auto-incremented numeric:
/*
** #param $filename [string] This is the source
** #param $toFile [string] This is the base name for the destination file & path
** #param $chunk [num] This is the max file size based on MB so 1.5 is 1.5MB
*/
function breakDownFile($filename,$toFile,$chunk = 1)
{
// Take the MB value and convert it into KB
$chunk = ($chunk*1024);
// Get the file size of the source, divide by kb
$length = filesize($filename)/1024;
// Put a max in bits
$max = $chunk*1000;
// Start value for naming the files incrementally
$i = 1;
// Open *for reading* the source file
$r = fopen($filename,'r');
// Create a new file for writing, use the increment value
$w = fopen($toFile.$i.'.txt','w');
// Loop through the file as long as the file is readable
while(!feof($r)) {
// Read file but only to the max file size value set
$buffer = fread($r, $max);
// Write to disk using buffer as a guide
fwrite($w, $buffer);
// Check the bit size of the buffer to see if it's
// same or larger than limit
if(strlen($buffer) >= $max) {
// Close the file
fclose($w);
// Add 1 to our $i file
$i++;
// Start a new file with the new name
$w = fopen($toFile.$i.'.txt','w');
}
}
// When done the loop, close the writeable file
fclose($w);
// When done loop close readable
fclose($r);
}
To use:
breakDownFile(__DIR__.'/test.txt',__DIR__.'/tofile',1.5);

Related

How to read a range of rows from CSV file to JSON array using PHP to handle large CSV file?

The target is how to read a range of rows/lines from large CSV file into a JSON array in order to handle large files and read the data in pagination method, each page fetches a range of lines ( e.x. page number 1 fetch from line 1 to 10, page number 2 fetch from line 11 to line 20, and so and ).
the below PHP script read from the being CSV file to the desired line ($desired_line), My question is how we can determine the starting line to read from a specific line ($starting_line)
<?php
// php function to convert csv to json format
function csvToJson($fname, $starting_line, $desired_line) {
// open csv file
if (!($fp = fopen($fname, 'r'))) {
die("Can't open file...");
}
//read csv headers
$key = fgetcsv($fp,"1024","\t");
$line_counter = 0;
// parse csv rows into array
$json = array();
while (($row = fgetcsv($fp,"1024","\t")) && ($line_counter < $desired_line)) {
$json[] = array_combine($key, $row);
$line_counter++;
}
// release file handle
fclose($fp);
// encode array to json
return json_encode($json);
}
// Define the path to CSV file
$csv = 'file.csv';
print_r(csvToJson($csv, 20, 30));
?>

You should use functions like:
fgets() to read the file line by line
fseek() to move to the position of the last fgets() of the chunk
ftell() to read the position for fseek()
Something like this (it's only a schema):
<?php
...
$line_counter = 0;
$last_pos = ...
$fseek($fp,$last_pos);
while($line = fgets($fp)){ // read a line of the file
$line_counter++;
(...) // parse line of csv here
if($line_counter == 100){
$lastpos = ftell($fp);
(...) // save the $lastpos for next reading cycle
break;
}
}
...
?>
You can also skip the fseek() and ftell() part and just count the lines every time from the beginning, but that will generally have to go through the whole file from the beginning till the desired lines.

PHP File Handling (Download Counter) Reading file data as a number, writing it as that plus 1

I'm trying to make a download counter in a website for a video game in PHP, but for some reason, instead of incrementing the contents of the downloadcount.txt file by 1, it takes the number, increments it, and appends it to the end of the file. How could I just make it replace the file contents instead of appending it?
Here's the source:
<?php
ob_start();
$newURL = 'versions/v1.0.0aplha/Dungeon1UP.zip';
//header('Location: '.$newURL);
//increment download counter
$file = fopen("downloadcount.txt", "w+") or die("Unable to open file!");
$content = fread($file,filesize("downloadcount.txt"));
echo $content;
$output = (int) $content + 1;
//$output = 'test';
fwrite($file, $output);
fclose($file);
ob_end_flush();
?>
The number in the file is supposed to increase by one every time, but instead, it gives me numbers like this: 101110121011101310111012101110149.2233720368548E+189.2233720368548E+189.2233720368548E+18

As correctly pointed out in one of the comments, for your specific case you can use fseek ( $file, 0 ) right before writing, such as:
fseek ( $file, 0 );
fwrite($file, $output);
Or even simpler you can rewind($file) before writing, this will ensure that the next write happens at byte 0 - ie the start of the file.
The reason why the file gets appended it is because you're opening the file in append and truncate mode, that is "w+". You have to open it in readwrite mode in case you do not want to reset the contents, just "r+" on your fopen, such as:
fopen("downloadcount.txt", "r+")
Just make sure the file exists before writing!
Please see fopen modes here:
https://www.php.net/manual/en/function.fopen.php
And working code here:
https://bpaste.net/show/iasj

It will be much simpler to use file_get_contents/file_put_contents:
// update with more precise path to file:
$content = file_get_contents(__DIR__ . "/downloadcount.txt");
echo $content;
$output = (int) $content + 1;
// by default `file_put_contents` overwrites file content
file_put_contents(__DIR__ . "/downloadcount.txt", $output);

That appending should just be a typecasting problem, but I would not encourage you to handle counts the file way. In order to count the number of downloads for a file, it's better to make a database update of a row using transactions to handle concurrency properly, as doing it the file way could compromise accuracy.

You can get the content, check if the file has data. If not initialise to 0 and then just replace the content.
$fileContent = file_get_contents("downloadcount.txt");
$content = (!empty($fileContent) ? $fileContent : 0);
$content++;
file_put_contents('downloadcount.txt', $content);
Check $str or directly content inside the file

filesize() always reads 0 bytes even though file size isn't 0 bytes

I wrote some code below, at the moment I'm testing so there's no database queries in the code.
The code below where it says if(filesize($filename) != 0) always goes to else even though the file is not 0 bytes and has 16 bytes of data in there. I am getting nowhere, it just always seems to think file is 0 bytes.
I think it's easier to show my code (could be other errors in there but I'm checking each error as I go along, dealing with them one by one). I get no PHP errors or anything.
$filename = 'memberlist.txt';
$file_directory = dirname($filename);
$fopen = fopen($filename, 'w+');
// check is file exists and is writable
if(file_exists($filename) && is_writable($file_directory)){
// clear statcache else filesize could be incorrect
clearstatcache();
// for testing, shows 0 bytes even though file is 16 bytes
// file has inside without quotes: '1487071595 ; 582'
echo "The file size is actually ".filesize($filename)." bytes.\n";
// check if file contains any data, also tried !==
// always goes to else even though not 0 bytes in size
if(filesize($filename) != 0){
// read file into an array
$fread = file($filename);
// get current time
$current_time = time();
foreach($fread as $read){
$var = explode(';', $read);
$oldtime = $var[0];
$member_count = $var[1];
}
if($current_time - $oldtime >= 86400){
// 24 hours or more so we query db and write new member count to file
echo 'more than 24 hours has passed'; // for testing
} else {
// less than 24 hours so don't query db just read member count from file
echo 'less than 24 hours has passed'; // for testing
}
} else { // WE ALWAYS END UP HERE
// else file is empty so we add data
$current_time = time().' ; ';
$member_count = 582; // this value will come from a database
fwrite($fopen, $current_time.$member_count);
fclose($fopen);
//echo "The file is empty so write new data to file. File size is actually ".filesize($filename)." bytes.\n";
}
} else {
// file either does not exist or cant be written to
echo 'file does not exist or is not writeable'; // for testing
}
Basically the code will be on a memberlist page which currently retrieves all members and counts how many members are registered. The point in the script is if the time is less than 24 hours we read the member_count from file else if 24 hours or more has elapsed then we query database, get the member count and write new figure to file, it's to reduce queries on the memberlist page.
Update 1:
This code:
echo "The file size is actually ".filesize($filename)." bytes.\n";
always outputs the below even though it's not 0 bytes.
The file size is actually 0 bytes.
also tried
var_dump (filesize($filename));
Outputs:
int(0)

You are using:
fopen($filename, "w+")
According to the manual w+ means:
Open for reading and writing; place the file pointer at the beginning of the file and truncate the file to zero length. If the file does not exist, attempt to create it.
So the file size being 0 is correct.
You probably need r+

Sorry I know this question is closed but I am writing my own answer so it might be useful for someone else
if use c+ in fopen function ,
fopen($filePath , "c+");
then the filesize() function return size of file
and you can use clearstatcache($filePath) to clear the cache of this file.
notice: when we use c+ in fopen() and then use the fread(), function reserve the file content and place our string at the end of file content

How to tell if the whole file has been downloaded using stream_copy_to_stream?

http://php.net/stream_copy_to_stream $maxlength parameter allows to limit the number of bytes to copy.
How to tell if the limit was breached, ie. not the whole file was downloaded?

With regards to Yousuf Memon comment (http://mattgemmell.com/2008/12/08/what-have-you-tried/), my approach to the issue is simple logic:
public function download($size_limit)
{
if($this->temp_file)
{
throw new UploadRemoteImageException('Resource has been downloaded already.');
}
$this->temp_file = tempnam(sys_get_temp_dir(), $this->temp_file_prefix);
$src = fopen($this->url, 'r');
$dest = fopen($this->temp_file, 'w+');
stream_copy_to_stream($src, $dest, $size_limit+1000);
if(filesize($dest) > $size_limit)
{
// The file size limit has been breached.
}
// [..]
}
This works by simply adding more bytes on top of the user defined limit. Then when the stream is closed, it check if the file is larger than the user defined size limit (, which can be because we added the 1000 bytes on top).
However, I can't tell with confidence if this will always work, as I assume it also depends on the chunk size.

Trying to count, but changes to 1

I am trying to make a counter. What I mean by that is a button that uses a XHTMLrequest and just runs this PHP.
My question is why is my counting code changing the value of the text document to the number 1. If I just change the value to for example 24, instead of adding 1 and changing the value to 25, it changes the value to the number 1.
<?php
$fp = false;
// Open file for reading, then writing
while ( ($fp=fopen('clicks.txt','r+'))===false ) {
usleep(250000); // Delay 1/4 second
}
// Obtain lock
while ( !flock($fp, LOCK_EX) ) {
usleep(250000); // Delay 1/4 second
}
// Read Clicks
$clicks = trim(fread($fp,1024));
// Add click
$clicks++;
// Empty file
ftruncate($fp,0);
// Write clicks
fwrite($fp, $clicks);
// Release Lock
flock($fp, LOCK_UN);
// Release handle
fclose($fp);
?>

It is because when you read in the information from the file it is a string and needs to be converted to an integer before you can add 1 to it.
change:
$clicks = trim(fread($fp,1024));
to
$clicks = intval(trim(fread($fp,1024)));

Replace $clicks++; with $clicks = $clicks + 1;.

I notice you ftruncate but never rewind the file. Remember that the pointer stays where it last read from, then truncating it makes the file 0 but the pointer remains the same.
Citing the PHP documentation:
<?php
$filename = 'lorem_ipsum.txt';
$handle = fopen($filename, 'r+');
ftruncate($handle, rand(1, filesize($filename)));
rewind($handle);
echo fread($handle, filesize($filename));
fclose($handle);
?>
Note that PHP's example rewinds the file as well.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Splitting up a large text document into multiple smaller text files - php

Related

How to read a range of rows from CSV file to JSON array using PHP to handle large CSV file?

PHP File Handling (Download Counter) Reading file data as a number, writing it as that plus 1

filesize() always reads 0 bytes even though file size isn't 0 bytes

How to tell if the whole file has been downloaded using stream_copy_to_stream?

Trying to count, but changes to 1

Categories

Resources