Which method is better? Hashing each line in a file with PHP - php

This question was asked on a message board, and I want to get a definitive answer and intelligent debate about which method is more semantically correct and less resource intensive.
Say I have a file with each line in that file containing a string. I want to generate an MD5 hash for each line and write it to the same file, overwriting the previous data. My first thought was to do this:
$file = 'strings.txt';
$lines = file($file);
$handle = fopen($file, 'w+');
foreach ($lines as $line)
{
fwrite($handle, md5(trim($line))."\n");
}
fclose($handle);
Another user pointed out that file_get_contents() and file_put_contents() were better than using fwrite() in a loop. Their solution:
$thefile = 'strings.txt';
$newfile = 'newstrings.txt';
$current = file_get_contents($thefile);
$explodedcurrent = explode('\n', $thefile);
$temp = '';
foreach ($explodedcurrent as $string)
$temp .= md5(trim($string)) . '\n';
$newfile = file_put_contents($newfile, $temp);
My argument is that since the main goal of this is to get the file into an array, and file_get_contents() is the preferred way to read the contents of a file into a string, file() is more appropriate and allows us to cut out another unnecessary function, explode().
Furthermore, by directly manipulating the file using fopen(), fwrite(), and fclose() (which is the exact same as one call to file_put_contents()) there is no need to have extraneous variables in which to store the converted strings; you're writing them directly to the file.
My method is the exact same as the alternative - the same number of opens/closes on the file - except mine is shorter and more semantically correct.
What do you have to say, and which one would you choose?
This should be more efficient and less resource-intensive as the previous two methods:
$file = 'passwords.txt';
$passwords = file($file);
$converted = fopen($file, 'w+');
while (count($passwords) > 0)
{
static $i = 0;
fwrite($converted, md5(trim($passwords[$i])));
unset($passwords[$i]);
$i++;
}
fclose($converted);
echo 'Done.';

As one of the comments suggests do what makes more sense to you. Since you might come back to this code in few months and you need to spend least amount of time trying to understand it.
However, if speed is your concern then I would create two test cases (you pretty much already got them) and use timestamp (create variable with timestamp at the beginning of the script, then at the end of the script subtract it from timestamp at the end of the script to work out the difference - how long it took to run the script.) Prepare few files I would go for about 3, two extremes and one normal file. To see which version runs faster.
http://php.net/manual/en/function.time.php
I would think that differences would be marginal, but it also depends on your file sizes.

I'd propose to write a new temporary file, while you process the input one. Once done, overwrite the input file with the temporary one.

Related

Should I use "fgetcsv" instead of "array_map" and how to do it?

I've made this script to extract data from a CSV file.
$url = 'https://flux.netaffiliation.com/feed.php?maff=3E9867FCP3CB0566CA125F7935102835L51118FV4';
$data = array_map(function($line) { return str_getcsv($line, '|'); }, file($url));
It's working exactly as I want but I've just been told that it's not the proper way to do it and that I really should use fgetcsv instead.
Is it right ? I've tried many ways to do it with fgetcsv but didn't manage at all to get anything close.
Here is an example of what i would like to get as an output :
$data[4298][0] = 889698467841
$data[4298][1] = Figurine Funko Pop! - N° 790 - Disney : Mighty Ducks - Coach Bombay
$data[4298][2] = 108740
$data[4298][3] = 14.99
First of all, there is no the ONE proper way to do things in programming. It is up to you and depends on your use case.
I just downloaded the CSV file and it is ca. 20MB big. In your solution you download the whole file at once. If you do not have any memory restrictions and you do not have to give a fast feedback to the caller, I mean if the delay for downloading of the whole file is not important, your solution is better solution, if you want to guarantee the processing of the whole content. In this case, you read all the content at once and the further processing of the content does not depend on other things like your Internet connection etc.
If you want to use fgetcsv, you would read from the URL line by line squentially. Your connection has to remain until a line has been processed. In this you do not need big memory allocation but it would take longer to having processed the whole content.
Both methods have their pros and contras. You should know what is your goal. How often would you run this script? You should consider your use case and make a decision which method is the best for you.
Here is the same result without array_map():
$url = 'https://flux.netaffiliation.com/feed.php?maff=3E9867FCP3CB0566CA125F7935102835L51118FV4';
$lines = file($url);
$data = [];
foreach($lines as $line)
{
$data[] = str_getcsv(trim($line), '|');
//optionally:
//$data[] = explode('|',trim($line));
}
$lines = null;

Storing last line of file in an array continuously in PHP

So i have a little issue with some PHP read functionality. What I am trying to do is basically grab data into an array from a file that is being continuously updated from a python script reading values from a micro controller. So basically, the file would look something like this.
ID, Datetime, Count, Name
ID, Datetime, Count, Name
ID, Datetime, Count, Name
What i need is for it to read the new line that is being added in (eof) and store it into an array. So what i have so far is allowing read access into the file
<?php
$myfile = fopen("read.txt", "r")
For the storing the lines in an array i figured something like an array map would be efficient
$result = array();
// some loop
$parts = array_map('trim', explode(':', $line_of_text, 2)));
$result[$parts[0]] = $parts[1];
However i am not to sure on how to structure the loop to have it read the new line that is being updated in the file without exiting the loop.
while (feof($file)) {
}
fclose($file);
?>
Any help would be appreciated!!
Can you do this?
Read the lines of the file to an array using $lines = file("filename");.
Use the $lines[count($lines) - 1] to get the last line?
You can even trim off the empty lines before you wanna do this.
Trim Empty Lines
Use this function:
$lines = array_filter($lines);
Since the file is continually being appended, you'd have to read until you hit the end of file, sleep for a while to let more data be appended, then read again.
e.g.
while(true) {
while(!feof($file)) {
... process data
}
sleep(15); // pause to let more data be appended
}
However, I'm not sure if PHP will cache the fact that it hit eof, and not try again once the sleep() finishes. It may be necessary to record your current position ftell(), close the file, reopen it, then fseek() to the stored location.
I've came up with this solution
$filename = "file.txt";
$file = fopen($filename, "r");
$lines = explode("/n", fread($file, filesize($filename)));
$last = $lines[count($lines)-1];
If the file is going to get big, it could take some time to parse, so its also possible to adjust the fread() function so it only reads the last 100 characters for example.

How to pass a file as an argument to php exec?

I would like to know how I can pass the content of a file (csv in my case) as an argument for a command line executable (in C or Objective C) to be called by exec in php.
Here is what I have done: the user loads the content of its file from an URL like this:
http://www.myserver.com/model.php?fileName=test.csv
Then the following code allows php to parse and load the csv file:
<?php
$f = $_GET['fileName'];
$handle = fopen("$f", "r");
$data = array();
while (($line = fgetcsv($handle)) !== FALSE) {
$data[] = $line;
}
?>
where I'm stuck is how to pass the content of this csv file as an argument to exec. Even if I can assume the csv is known to have only two columns, how many rows it has is user-specific, so I cannot pass all the values one by one as parameters, e.g.
exec("/path_to_executable/model -a $data[0][0] -b $data[0][1] .....");
The only alternative solution I guess would be to write something like that:
exec("/path_to_executable/model -fileName test.csv");
and have the command line executable do the csv parsing, but in that case, I think I need to have the csv file physically written on the server side. I'm wondering what happens if several people are accessing the webpage at the same time with their own different csv file, are they over-writing each others?
I guess there must be a much proper way to do this and I have not figured it out. Any idea? Thanks!
I would recommend having that data on disk, and loading it within the command line utility - it is much less messing about. But if you can't do that, just pass it in 1 (unparsed) line at a time:
$command = "/path_to_executable/model";
foreach ($fileData as $line) {
$command .= ' "'.escapeshellarg($line).'"';
}
exec($command);
Then you can just fetch the data into your utility by looping argv, where argv[0] is the first line, argv[1] is the second line, and so on.
you could use popen() to get a handle on the process to write to. If you need to go both ways (read/write) and might requre some more power, have a look a proc_open().
You could also just write your data to some random file (to avoid multiple users kicking each other's race-conditioned butts). Something along the lines of
<?php
$csv = file_get_contents('http://www.myserver.com/model.php?fileName=test.csv
');
$filename = '/tmp/' . uniqid(sha1($csv)) . '.csv';
file_put_contents($filename, $csv);
exec('/your/thing < '. escapeshellarg($filename));
unlink($filename);
And since you're also in charge of the executable, you might figure out how to get the number of arguments passed (hint: argc) and read them in (hint: argv). Passing them through line-based like so:
<?php
$csvRow = fgetcsv($fh);
if ($csvRow) {
$escaped = array_map('escapeshellarg', $csvRow);
exec('/your/thing '. join(' ', $escaped));
}

How to save memory when reading a file in Php?

I have a 200kb file, what I use in multiple pages, but on each page I need only 1-2 lines of that file so how I can read only these lines what I need if I know the line number?
For example if I need only the 10th line, I don`t want to load in memory all the lines, just the 10th line.
Sorry for my bad english!
Try SplFileObject
echo memory_get_usage(), PHP_EOL; // 333200
$file = new SplFileObject('bible.txt'); // 996kb
$file->seek(5000); // jump to line 5000 (zero-based)
echo $file->current(), PHP_EOL; // output current line
echo memory_get_usage(), PHP_EOL; // 342984 vs 3319864 when using file()
For outputting the current line, you can either use current() or just echo $file. I find it clearer to use the method though. You can also use fgets(), but that would get the next line.
Of course, you only need the middle three lines. I've added the memory_get_usage calls just to prove this approach does eat almost no memory.
Unless you know the offset of the line, you will need to read every line up to that point. You can just throw away the old lines (that you don't want) by looping through the file with something like fgets(). (EDIT: Rather than fgets(), I would suggest #Gordon's solution)
Possibly a better solution would be to use a database, as the database engine will do the grunt work of storing the strings and allow you to (very efficiently) get a certain "line" (It wouldn't be a line but a record with an numeric ID, however it amounts to the same thing) without having to read the records before it.
Do the contents of the file change? If it's static, or relatively static, you can build a list of offsets where you want to read your data. For instance, if the file changes once a year, but you read it hundreds of times a day, then you can pre-compute the offsets of the lines you want and jump to them directly like this:
$offsets = array();
while ($line = fread($filehandle)) { .... find line 10 .... }
$offsets[10] = ftell($filehandle); // store line 10's location
.... find next line
$offsets[20] = ftell($filehandle);
and so on. Afterwards, you can trivially jump to that line's location like this:
$fh = fopen('file.txt', 'rb');
fseek($fh, $offsets[20]); // jump to line 20
But this could entirely be overkill. Try benchmarking the operations - compare how long it takes to do an oldfashioned "read 20 lines" versus precompute/jump.
<?php
$lines = array(1, 2, 10);
$handle = #fopen("/tmp/inputfile.txt", "r");
if ($handle) {
$i = 0;
while (!feof($handle)) {
$line = stream_get_line($handle, 1000000, "\n");
if (in_array($i, $lines)) {
echo $line;
$line = ''; // Don't forget to clean the buffer!
}
if ($i > end($lines)) {
break;
}
$i++;
}
fclose($handle);
}
?>
Just loop through them without storing, e.g.
$i = 1;
$file = fopen('file.txt', 'r');
while (!feof($file)) {
$line = fgets($file); // this gets whole line from the file;
if ($i == 10) {
break; // break on tenth line
}
$i ++;
}
The above example would keep memory for only the last line it got from the file, so this is the most memory efficient way to do it.
use fgets(). 10 times :-) in this case you will not store all 10 lines in the memory
Why are you only trying to load the first ten lines? Do you know that loading all those lines is in fact a problem?
If you haven't measured, then you don't know that it's a problem. Don't waste your time optimizing for non-problems. Chances are that any performance change you'll have in not loading the entire 200K file will be imperceptible, unless you know for a fact that loading that file is indeed a bottleneck.

Getting one line in a huge file with PHP

How can i get a particular line in a 3 gig text file. The lines are delimited by \n. And i need to be able to get any line on demand.
How can this be done? Only one line need be returned. And i would not like to use any system calls.
Note: There is the same question elsewhere regarding how to do this in bash. I would like to compare it with the PHP equiv.
Update: Each line is the same length the whole way thru.
Without keeping some sort of index to the file, you would need to read all of it until you've encountered x number of \n characters. I see that nickf has just posted some way of doing that, so I won't repeat it.
To do this repeatedly in an efficient manner, you will need to build an index. Store some known file positions for certain (or all) line numbers once, which you can then use to seek to the right location using fseek.
Edit: if each line is the same length, you do not need the index.
$myfile = fopen($fileName, "r");
fseek($myfile, $lineLength * $lineNumber);
$line = fgets($myfile);
fclose($myfile);
Line number is 0 based in this example, so you may need to subtract one first. The line length includes the \n character.
There is little discussion of the problem and no mention is made of how the 'one line' should be referenced (by number, some value within it, etc.) so below is just a guess as to what you're wanting.
If you're not averse to using an object (it might be 'too high level', perhaps) and wish to reference the line by offset, then SplFileObject (available as of PHP 5.1.0) could be used. See the following basic example:
$file = new SplFileObject('myreallyhugefile.dat');
$file->seek(12345689); // seek to line 123456790
echo $file->current(); // or simply, echo $file
That particular method (seek) requires scanning through the file line-by-line. However, if as you say all the lines are the same length then you can instead use fseek to get where you want to go much, much faster.
$line_length = 1024; // each line is 1 KB line
$file->fseek($line_length * 1234567); // seek lots of bytes
echo $file->current(); // echo line 1234568
You said each line has the same length, so you can use fopen() in combination with fseek() to get a line quickly.
http://ch2.php.net/manual/en/function.fseek.php
The only way I can think to do it would be like this:
function getLine($fileName, $num) {
$fh = fopen($fileName, 'r');
for ($i = 0; $i < $num && ($line = fgets($fh)); ++$i);
return $line;
}
While this is not a solution exactly, how come you are needing to pull out one line from a 3 gig text file? is perfomance an issue or can this run a leisurely pace?
If you need pull lots of lines out of this file at different points in time, i would definately suggest putting this data into a DB of some kind. SQLite maybe your friend here as its very simple but not great with lots of scripts/people accessing it at one time.

Categories