I have some text files that are very large - 100MB each that contain a single-line string (just 1 line). I want to extract the last xx bytes / characters from each of them. I know how to do this by reading them in a string and then searching by strpos() or substr() but that would require a large chunk of the RAM which isn't desirable for such a small action.
Is there any other way I can just extract, say, the last 50 bytes / characters of the text file in PHP before executing the search?
Thank you!
You can use fseek:
$fp = fopen('somefile.txt', 'r');
fseek($fp, -50, SEEK_END); // It needs to be negative
$data = fgets($fp, 50);
You can do this with file_get_contents by playing with the fourth parameter offset.
PHP 7.1.0 onward:
In PHP 7.1.0 the fourth parameter offset can be negative.
// only negative seek if it "lands" inside the file or false will be returned
if (filesize($filename) > 50) {
$data = file_get_contents($filename, false, null, -50);
}
else {
$data = file_get_contents($filename);
}
Pre PHP 7.1.0:
$fsz = filesize($filename);
// only negative seek if it "lands" inside the file or false will be returned
if ($fsz > 50) {
$data = file_get_contents($filename, false, null, $fsz - 50);
}
else {
$data = file_get_contents($filename);
}
Related
I have a file that's so large I'm unable to read it into a string in one go, but have to use buffering:
$fp = #fopen("bigfile", 'rb');
while (!feof($fp)) {
//process buffer
}
For simplicity, say the file contains a sequence of integer string pairs, where the integer holds the length of the string. Then the code I want to realise in process buffer, is unpack an int, read that many characters from the buffer, then repeat.
I appreciate any suggestions in dealing with the scenario where the string spans one buffer to the next. I'm sure that this problem must have been solved and that there is a design pattern for it, I just don't know where to start looking.
Any help would be appreciated.
Not sure if you're looking for an extra-clever solution, but straight forward would be:
while (!feof($fp)) {
$len = fread($fp, 2); // integer-2 bytes ...?
// <--- add checks here len($len)==2 and so on...
$len = unpack('S', $len); // pick the correct format character from http://docs.php.net/function.pack
while(!feof($fp) && $len) {
$cbRead = $len < MAX_CHUNK_LEN ? $len : MAX_CHUNK_LEN;
$buf = fread($fp, $cbRead);
// <--- add checks here len($buf)==$cbRead and so on...
$len -= $cbRead;
// ... process buf
}
if ( $len!=0 ) {
errorHandler();
}
else {
processEndOfString();
}
}
I have a very large file with only single line. It contains about 2.6 million of numbers. The file is about 15 mb.
My goal is to find the nth number in this single line string.
I tried to read the file into a string (remember it is single line file). Then I exploded the strings into an array which I ran out of memory. (Allowed memory size of 268435456 bytes exhausted (tried to allocate 71 bytes)
Am I doing it right? Or is there another easier way to find the nth value in a very large string?
$file = file_get_contents ('a.txt', true);
$array = explode(" ", $file, -1);
echo $array[$nth];
Create a counter variable; read the file using fopen and loop it in a while with feof and fgets (with the desired buffer size); within the loop, check how many spaces are present in the bit you just read (I'm assuming your entries are separated by spaces, it could be commas or whatever); finally increment the counter and go on until you reach the part you want (after a n number of spaces, you have the [n+1]th entry you are looking for).
I include some tested (with a 16 MB file) proof-of-concept code. I don't know if there are better ways to do it; this is the only one that came to my mind and it works. memory_get_usage reports a memory usage of ~8 kb.
<?php
$counter;
$nth = 49959;
$handle = #fopen('numbers.txt', 'r'); // File containing numbers from 1 to 2130829, size ~16 MB.
if ($handle) {
while (($buffer = fgets($handle, 128)) !== false) {
$spaces = substr_count($buffer, ' ');
if ($counter + $spaces > $nth) {
$numbers = explode(' ', $buffer);
$key = $nth - $counter;
echo $numbers[$key]; // print '49959'
exit;
}
else {
$counter += $spaces;
}
}
if (!feof($handle)) {
echo "Error: unexpected fgets() fail\n";
}
fclose($handle);
}
?>
I am writing an application that can stream videos. It requires the filesize of the video, so I use this code:
$filesize = sprintf("%u", filesize($file));
However, when streaming a six gig movie, it fails.
Is is possible to get a bigger interger value in PHP? I don't care if I have to use third party libraries, if it is slow, all I care about is that it can get the filesize properly.
FYI, $filesize is currently 3017575487 which is really really really really far from 6000000000, which is roughly correct.
I am running PHP on a 64 bit operating system.
Thanks for any suggestions!
The issue here is two-fold.
Problem 1
The filesize function returns a signed integer, with a maximum value of PHP_INT_MAX. On 32-bit PHP, this value is 2147483647 or about 2GB. On 64-bit PHP can you go higher, up to 9223372036854775807. Based on the comments from the PHP filesize page, I created a function that will use a fseek loop to find the size of the file, and return it as a float, which can count higher that a 32-bit unisgned integer.
function filesize_float($filename)
{
$f = fopen($filename, 'r');
$p = 0;
$b = 1073741824;
fseek($f, 0, SEEK_SET);
while($b > 1)
{
fseek($f, $b, SEEK_CUR);
if(fgetc($f) === false)
{
fseek($f, -$b, SEEK_CUR);
$b = (int)($b / 2);
}
else
{
fseek($f, -1, SEEK_CUR);
$p += $b;
}
}
while(fgetc($f) !== false)
{
++$p;
}
fclose($f);
return $p;
}
To get the file size of the file as a float using the above function, you would call it like this.
$filesize = filesize_float($file);
Problem 2
Using %u in the sprintf function will cause it to interpret the argument as an unsigned integer, thus limiting the maximum possible value to 4294967295 on 32-bit PHP, before overflowing. Therefore, if we were to do the following, it would return the wrong number.
sprintf("%u", filesize_float($file));
You could interpret the value as a float using %F, using the following, but it will result in trailing decimals.
sprintf("%F", filesize_float($file));
For example, the above will return something like 6442450944.000000, rather than 6442450944.
A workaround would be to have sprintf interpret the float as a string, and let PHP cast the float to a string.
$filesize = sprintf("%s", filesize_float($file));
This will set $filesize to the value of something like 6442450944, without trailing decimals.
The Final Solution
If you add the filesize_float function above to your code, you can simply use the following line of code to read the actual file size into the sprintf statement.
$filesize = sprintf("%s", filesize_float($file));
As per PHP docuemnation for 64 bit platforms, this seems quite reliable for getting the filesize of files > 4GB
<?php
$a = fopen($filename, 'r');
fseek($a, 0, SEEK_END);
$filesize = ftell($a);
fclose($a);
?>
My text file format is :
This is first line.
This is second line.
This is third line.
There could be more lines in the text file. How to echo one random line on each refresh from the text file with php.
All comments are appreciated. Thanks
How big of a file are we talking? the easy approach is to load the entire file into memory as string array and pick a random array index from 0 to N and show that line..
If the size of the file can get really big, then you'd have to implement some sort of streaming solution..
Streaming Solution Explained!
The following solution will yield a uniformly distributed random line from a relatively large file with an adjustable max line size per file.
<?php
function rand_line($fileName, $maxLineLength = 4096) {
$handle = #fopen($fileName, "r");
if ($handle) {
$random_line = null;
$line = null;
$count = 0;
while (($line = fgets($handle, $maxLineLength)) !== false) {
$count++;
// P(1/$count) probability of picking current line as random line
if(rand() % $count == 0) {
$random_line = $line;
}
}
if (!feof($handle)) {
echo "Error: unexpected fgets() fail\n";
fclose($handle);
return null;
} else {
fclose($handle);
}
return $random_line;
}
}
// usage
echo rand_line("myfile.txt");
?>
Let's say the file had 10 lines, the probability of picking line X is:
P(1) = 1
P(2) = 1/2 * P(1)
P(3) = 2/3 * P(2)
P(N) = (N-1)/N * P(N-1) = 1/N
Which will ultimately give us a uniformly distributed random line from a file of any size without actually reading the entire file into memory.
I hope it will help.
A generally good approach to this kind of situation is to:
Read the lines into an array using file()
echo out a random array value using array_rand()
Your code could look something like this:
$lines = file('my_file.txt');
echo $lines[array_rand($lines)];
I want to check the file's size of local drives on windows OS.But the native PHP function filesize() only work when the file size less than 2GB. The file which greater than 2GB will return the wrong number.So,is there other way to get the file size which greater than 2GB?
Thank you very much!!
You can always use the system's file size method.
For Windows:
Windows command for file size only?
#echo off
echo %~z1
For Linux
stat -c %s filenam
You would run these through the exec php command.
PHP function to get the file size of a local file with insignificant memory usage:
function get_file_size ($file) {
$fp = #fopen($file, "r");
#fseek($fp,0,SEEK_END);
$filesize = #ftell($fp);
fclose($fp);
return $filesize;
}
In first line of code, $file is opened in read-only mode and attached to the $fp handle.
In second line, the pointer is moved with fseek() to the end of $file.
Lastly, ftell() returns the byte position of the pointer in $file, which is now the end of it.
The fopen() function is binary-safe and it's apropiate for use even with very large files.
The above code is also very fast.
this function works for any size:
function fsize($file) {
// filesize will only return the lower 32 bits of
// the file's size! Make it unsigned.
$fmod = filesize($file);
if ($fmod < 0) $fmod += 2.0 * (PHP_INT_MAX + 1);
// find the upper 32 bits
$i = 0;
$myfile = fopen($file, "r");
// feof has undefined behaviour for big files.
// after we hit the eof with fseek,
// fread may not be able to detect the eof,
// but it also can't read bytes, so use it as an
// indicator.
while (strlen(fread($myfile, 1)) === 1) {
fseek($myfile, PHP_INT_MAX, SEEK_CUR);
$i++;
}
fclose($myfile);
// $i is a multiplier for PHP_INT_MAX byte blocks.
// return to the last multiple of 4, as filesize has modulo of 4 GB (lower 32 bits)
if ($i % 2 == 1) $i--;
// add the lower 32 bit to our PHP_INT_MAX multiplier
return ((float)($i) * (PHP_INT_MAX + 1)) + $fmod;
}
note: this function maybe litte slow for files > 2gb
(taken from php comments)
If you're running a Linux server, use the system command.
$last_line = system('ls');
Is an example of how it is used. If you replace 'ls' with:
du <filename>
then it will return an integer of the file size in the variable $last_line. For example:
472 myProgram.exe
means it's 472 KB. You can use regular expressions to obtain just the number. I haven't used the du command that much, so you'd want to play around with it and have a look at what the output is for files > 2gb.
http://php.net/manual/en/function.system.php
<?php
$files = `find / -type f -size +2097152`;
?>
This function returns the size for files > 2GB and is quite fast.
function file_get_size($file) {
//open file
$fh = fopen($file, "r");
//declare some variables
$size = "0";
$char = "";
//set file pointer to 0; I'm a little bit paranoid, you can remove this
fseek($fh, 0, SEEK_SET);
//set multiplicator to zero
$count = 0;
while (true) {
//jump 1 MB forward in file
fseek($fh, 1048576, SEEK_CUR);
//check if we actually left the file
if (($char = fgetc($fh)) !== false) {
//if not, go on
$count ++;
} else {
//else jump back where we were before leaving and exit loop
fseek($fh, -1048576, SEEK_CUR);
break;
}
}
//we could make $count jumps, so the file is at least $count * 1.000001 MB large
//1048577 because we jump 1 MB and fgetc goes 1 B forward too
$size = bcmul("1048577", $count);
//now count the last few bytes; they're always less than 1048576 so it's quite fast
$fine = 0;
while(false !== ($char = fgetc($fh))) {
$fine ++;
}
//and add them
$size = bcadd($size, $fine);
fclose($fh);
return $size;
}
To riff on joshhendo's answer, if you're on a Unix-like OS (Linux, OSX, macOS, etc) you can cheat a little using ls:
$fileSize = trim(shell_exec("ls -nl " . escapeshellarg($fullPathToFile) . " | awk '{print $5}'"));
trim() is there to remove the carriage return at the end. What's left is a string containing the full size of the file on disk, regardless of size or stat cache status, with no human formatting such as commas.
Just be careful where the data in $fullPathToFile comes from...when making system calls you don't want to trust user-supplied data. The escapeshellarg will probably protect you, but better safe than sorry.