What I need is an equivalent for PHP's fseek() function. The function works on files, but I have a variable that contains binary data and I want to work on it. I know I could use substr(), but that would be lame - it's used for strings, not for binary data. Also, creating a file and then using fseek() is not what I am looking for either.
Maybe something constructed with streams?
EDIT: Okay, I'm almost there:
$data = fopen('data://application/binary;binary,'.$bin,'rb');
Warning: failed to open stream: rfc2397: illegal parameter
Kai:
You have almost answered yourself here. Streams are the answer. The following manual entry will be enlightening: http://us.php.net/manual/en/wrappers.data.php
It essentially allows you to pass arbitrary data to PHP's file handling functions such as fopen (and thus fseek).
Then you could do something like:
<?php
$data = fopen('data://mime/type;encoding,' . $binaryData);
fseek($data, 128);
?>
fseek on data in a variable doesn't make sense. fseek just positions the file handle to the specified offset, so the next fread call starts reading from that offset. There is no equivalent of fread for strings.
Whats wrong with substr()?
With a file you would do:
$f = fopen(...)
fseek($f, offset)
$x = fread($f, len)
with substr:
$x = substr($var, offset, len)
I'm guessing, but maybe what is being asked for is a way to access bytes in a variable by using a pointer.. (using it like an array of bytes like you could do in c - without the memory overhead of putting the data in php arrays) and being able to edit them inplace without the overhead of copying the data.
Not being able to do this is a BIG problem, but if the operating system caches disk data well using fseek on a temporary file could be a workaround.
Related
I'm having a slight problem where I am using "openssl_encrypt" to encrypt a string of text that contains HTML, writing that string to a file, and then in a separate page, I am decrypting the entire file using "openssl_decrypt". I've made sure to use the same encryption key, same method, and same iv. I imagine this is something that, as a newbie to encryption, I just can't see. Thank you in advance for any help!
Here is some example code:
//An example of the string
$string = "<div class='mod'><div><span class='datetimestamp'>On 06/28/2016 at 04:32:09 PM, ** modified a record with id of \"5\" in the \"results\" table:</span><br><span class='record-label'>Prev Record:</span>jobnumber='none', dropdate='07/06/2016', eventdate='07/16/2016', dealership='ABC Nissan', pieces='3700', datatype='DB', letter='t'";
//The encryption
$encrypt = openssl_encrypt($string, 'AES-256-XTS', '93jkak3rzp72', 1, '45gh354687ls0349');
$file = fopen("logs/2016-06-28.log", 'a');
fwrite($file, $encrypt);
fclose($file);
//The decryption - DONE IN A SEPARATE PAGE
$file = #fopen("logs/2016-06-28.log", "r");
if ($file) {
while (($data = fgets($file)) !== false) {
$decrypt .= openssl_decrypt($data, 'AES-256-XTS', '93jkak3rzp72', 1, '45gh354687ls0349');
}
}
Perhaps the issue is that you are trying to append additional encrypted data, that will not generally work for several reasons, a major one being that AES is block based and there will most likely be padding. Many modes use some form of chaining and this will also fail when appending encrypted data.
You are opening the file you are writing to in append mode, that is not what you need, instead use write w mode. That is causing each encryption to be appended to the previous data and the reason the first tie works but subsequent times. If you examine the file length after each encryption it will be apparent what is happening.
You need to use:
$file = fopen("logs/2016-06-28.log", 'w');
From the php fopen docs:
'w' Open for writing only; place the file pointer at the beginning of the file and truncate the file to zero length. If the file does not exist, attempt to create it.
'a' Open for writing only; place the file pointer at the end of the file. If the file does not exist, attempt to create it. In this mode, fseek() has no effect, writes are always appended.
I wouldn't use fgets() as it only gets a single line from a file at a time and you can't split up an encrypted string and decrypt single pieces at a time.
You could use fgets() but you need to read in everything and store it in an variable and then after you have everything decrypt.
Or you can simply use something like file_get_contents() to get the entire file's content and then decrypt.
From a comment to this answer I read that "stream_get_contents is low-level" comparing to file_get_contents. However according to Manual, stream_get_contents is
Identical to file_get_contents(), except that stream_get_contents() operates on an already open stream resource and returns the remaining contents in a string, up to maxlength bytes and starting at the specified offset.
Which statement is correct?
Is stream_get_contents really lower level and faster?
Specifically I am interested in reading local files from HD.
I'm late here but it might help others
file_get_contents() loads the file content into memory. It sits there in memory and waits for the program to call echo upon which it will be delivered to the output buffer.
A good usage example is:
echo file_get_contents('file.txt');
stream_get_contents() delivers the content on an already open stream. An example is this:
$handle = fopen('file.txt', 'w+');
echo stream_get_contents($handle);
You could see that stream_get_contents() used an existing stream created by fopen() to get the contents as a string.
file_get_contents() is the more preferred way as it doesn't depend on an open stream, and is efficient with your memory using memory mapping techniques. For external sites reading, you can also set HTTP headers when getting the content. (Refer to https://www.php.net/manual/en/function.file-get-contents.php for more info)
For larger files/resources, stream_get_contents() may be preferred as it delivers the content fractionally as opposed to file_get_contents() where the entire data is dumped in memory.
Say I read a number of bytes like this:
$data = fread($fp, 4096);
Since fread will stop reading if it reaches the end of the file, how can I know exactly how much was read? Would strlen($data) work? Or could that be potentially wrong?
What I'm trying to accomplish, is to read a number of bytes, and then go back to where I was before I read. And I'm trying to avoid using arithmetic (ftell, fread, ftell, subract, fseek), since a file could potentially be larger than PHP_INT_MAX and potentially mess that up. What I would want is to just do fseek($fp, -$bytes_read, SEEK_CUR), but for that I need to know how many bytes I just read...
After fread use ftell($fp) to get the current file position.
Check this (untested):
mb_strlen($data, '8bit')
The second argument '8bit' forces the function to return the number of bytes.
Found in comments at php manual for mb_strlen.
How can i get a particular line in a 3 gig text file. The lines are delimited by \n. And i need to be able to get any line on demand.
How can this be done? Only one line need be returned. And i would not like to use any system calls.
Note: There is the same question elsewhere regarding how to do this in bash. I would like to compare it with the PHP equiv.
Update: Each line is the same length the whole way thru.
Without keeping some sort of index to the file, you would need to read all of it until you've encountered x number of \n characters. I see that nickf has just posted some way of doing that, so I won't repeat it.
To do this repeatedly in an efficient manner, you will need to build an index. Store some known file positions for certain (or all) line numbers once, which you can then use to seek to the right location using fseek.
Edit: if each line is the same length, you do not need the index.
$myfile = fopen($fileName, "r");
fseek($myfile, $lineLength * $lineNumber);
$line = fgets($myfile);
fclose($myfile);
Line number is 0 based in this example, so you may need to subtract one first. The line length includes the \n character.
There is little discussion of the problem and no mention is made of how the 'one line' should be referenced (by number, some value within it, etc.) so below is just a guess as to what you're wanting.
If you're not averse to using an object (it might be 'too high level', perhaps) and wish to reference the line by offset, then SplFileObject (available as of PHP 5.1.0) could be used. See the following basic example:
$file = new SplFileObject('myreallyhugefile.dat');
$file->seek(12345689); // seek to line 123456790
echo $file->current(); // or simply, echo $file
That particular method (seek) requires scanning through the file line-by-line. However, if as you say all the lines are the same length then you can instead use fseek to get where you want to go much, much faster.
$line_length = 1024; // each line is 1 KB line
$file->fseek($line_length * 1234567); // seek lots of bytes
echo $file->current(); // echo line 1234568
You said each line has the same length, so you can use fopen() in combination with fseek() to get a line quickly.
http://ch2.php.net/manual/en/function.fseek.php
The only way I can think to do it would be like this:
function getLine($fileName, $num) {
$fh = fopen($fileName, 'r');
for ($i = 0; $i < $num && ($line = fgets($fh)); ++$i);
return $line;
}
While this is not a solution exactly, how come you are needing to pull out one line from a 3 gig text file? is perfomance an issue or can this run a leisurely pace?
If you need pull lots of lines out of this file at different points in time, i would definately suggest putting this data into a DB of some kind. SQLite maybe your friend here as its very simple but not great with lots of scripts/people accessing it at one time.
To make this more clear, I'm going to put code samples:
$file = fopen('filename.ext', 'rb');
// Assume $pos has been declared
// method 1
fseek($file, $pos);
$parsed = fread($file, 2);
// method 2
while (!feof($file)) {
$data = fread($file, 1000000);
}
$data = bin2hex($data);
$parsed = substr($data, $pos, 2);
$fclose($file);
There are about 40 fread() in method 1 (with maybe 15 fseek()) vs 1 fread() in method 2. The only thing I am wondering is if loading in 1000000 bytes is overkill when you're really only extracting maybe 100 total bytes (all relatively close together in the middle of the file).
So which code is going to perform better? Which code makes more sense to use? A quick explanation would be greatly appreciated.
If you already know the offset you are looking for, fseek is the best method here, as there is no reason to load the whole file into memory if you only need a few bytes of it. The first method is better because you skip right to what you want in the file stream and read out a small portion. The second method requires you to read the entire file into memory, then seek through that while you could have just read it straight from the file. Hope this answers your question
Files are read in units of clusters, and a cluster is usually something like 8 kb. Usually a few clusters are read ahead.
So, if the file is only a few kb there is very little to gain by using fseek compared to reading the entire file. The file system will read the entire file anyway.
If the file is considerably larger, as in your case, only a few of the clusters has to be read, so the first method should perform better. At worst all the data will still be read from the disk, but your application will still use less memory.
It seems that seeking the position you want and then reading only be bytes you need is the best approach.
But the correct answer is (as always) to test it for real instead of guessing. Run your two examples in your server environment and make some time measurements. Also check memory usage. Then make your optimization once you have some hard data to back it up.