php efficient read file and insert sql - php

file: data.txt (11617 lines)
user datetime
23 2015-03-01 08:04:15
15 2015-05-01 08:05:20
105 2015-05-01 08:07:10
15 2015-06-01 08:08:29
105 2015-06-01 08:12:48
I only need data in 2015-06, I'm using fget and check each line's datetime but really slow, more than 50s.
$d='data.txt';
import($d);
function import($d){
$handle = fopen($d, "r") or die("Couldn't get handle");
if ($handle) {
while (!feof($handle)) {
$buffer = fgets($handle, 4096);
$line=explode("\t",$buffer);
if(date("Y-m",strtotime($line[1])=="2015-06"){
mysql_query("INSERT INTO `table` ....");
}
else{
//break? when month>6
}
}
fclose($handle);
}
}
SOLUTION: less than 2s!!!! (thanks to Kevin P. and Dragon)
if(substr($line[1],0,7)=="2015-06"){
$sql.=empty($sql)?"":","."(`".$line[1]."`.........)";
}
elseif(substr($line[1],0,7)>"2015-06"){
break;// when month>6
}
mysql_query("INSERT INTO `table` ....".$sql);

Can't be helped, use something faster than PHP. For instance, you can use grep or awk to read the file and filter it quickly. For example:
$lines = explode("\n", `awk '$2 ~ /^2015-06/ { print }' data.txt`);
EDIT: Also, fgets is not guaranteed to give you whole lines. You are getting 4096 bytes at a time; the boundary could be in the middle of a line, which will make the line not match if you are lucky, or break your code due to missed assumptions (such as the length of the $line array when constructing the SQL) if not.*
*) Or vice versa - it would be better for it to break completely, that is at least an obvious error yelling to be fixed; as opposed to silent data droppage.

Maybe insert multiple entries in to the DB at once instead of calling it every time you find a desired time?
In which case it's similar to this

Maybe you should use grep to filter out the lines you do not need.

Related

How do you get last some lines of file via SFTP in PHP

I need to login to a production server retrieve a file and update my data base with the data in this file. Since this is a production database, I don't want to get the whole file every 5 minutes since the file may be huge and this may impact the server. I need to get the last 30 lines of this file every 5 minutes interval and have as little impact as possible.
The following is my current code, I would appreciate any insight to how best accomplish this:
<?php
$user="id";
$pass="passed";
$c = curl_init("sftp://$user:$pass#server1.example.net/opt/vmstat_server1");
curl_setopt($c, CURLOPT_PROTOCOLS, CURLPROTO_SFTP);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($c);
curl_close($c);
$data = explode("\n", $data);
?>
Marc B is wrong. SFTP is perfectly capable of partial file transfers. Here's an example of how to do what you want with phpseclib, a pure PHP SFTP implementation:
<?php
include('Net/SFTP.php');
$sftp = new Net_SFTP('www.domain.tld');
if (!$sftp->login('username', 'password')) {
exit('Login Failed');
}
$size = $sftp->size('filename.remote');
// outputs the last ten bytes of filename.remote
echo $sftp->get('filename.remote', false, $size - 10);
?>
In fact I'd recommend an approach like this anyway since some SFTP servers don't let you run commands via the system shell. Plus, SFTP can work on Windows SFTP servers whereas tail is unlikely to do so even if you do have shell access. ie. overall, it's a lot more portable a solution.
If you want to get the last x lines of a file you could loop repeatedly, reading however many bytes each time, until you encounter 10x new line characters. ie. get the last 10 bytes, then the next to last 10 bytes, then the ten bytes before those ten bytes, etc.
An answer by #Sammitch to a duplicate question Get last 15 lines from a large file in SFTP with phpseclib:
The following should result in a blob of text with at least 15 lines from the end of the file that you can then process further with your existing logic. You may want to tweak some of the logic depending on if your file ends with a trailing newline, etc.
$filename = './file.txt'
$filesize = $sftp->size($filename);
$buffersize = 4096;
$offset = $filesize; // start at the end
$result = '';
$lines = 0;
while( $offset > 0 && $lines < 15 ) {
// work backwards
if( $offset < $buffersize ) {
$offset = 0;
} else {
$offset -= $buffer_size;
}
$buffer = $sftp->get($filename, false, $offset, $buffer_size));
// count the number of newlines as we go
$lines += substr_count($buffer, "\n");
$result = $buffer . $result;
}
SFTP is not capable of partial file transfers. You might have better luck using a fullblowin SSH connection and use a remote 'tail' operation to get the last lines of the file, e.g.
$lines = shell_exec("ssh user#remote.host 'tail -30 the_file'");
Of course, you might want to have something a little more robust that can handle things like net.glitches that prevent ssh from getting through, but as a basic starting point, this should do the trick.

How do I choose a specific line from a file?

I'm trying to make (as immature as this sounds) an application online that prints random insults. I have a list that is 140 lines long, and I would like to print one entire line. There is mt_rand(min,max) but when I use that alongside fgets(file, "line") It doesn't give me the line of the random number, it gives me the character. Any help? I have all the code so far below.
<?php
$file = fopen("Insults.txt","r");
echo fgets($file, (mt_rand(1, 140)));
fclose($file);
?>
Try this, it's easier version of what you want to do:
$file = file('Insults.txt');
echo $file[array_rand($file)];
$lines = file("Insults.txt");
echo $lines[array_rand($lines)];
Or within a function:
function random_line($filename) {
$lines = file($filename) ;
return $lines[array_rand($lines)] ;
}
$insult = random_line("Insults.txt");
echo $insult;
use file() for this. it returns an array with the lines of the file:
$lines = file($filename);
$line = mt_rand(0, count($lines));
echo $lines[$line];
First: You totally screwed on using fgets() correctly, please refer to the manual about the meaning of the second parameter (it just plainly not what you think it is).
Second: the file() solution will work... until the filesize exceeds a certain size and exhaust the complete PHP memory. Keep in mind: file() reads the complete file into an array.
You might be better off with reading line-by-line, even if that means you'll have to discard most of the read data.
$fp = fopen(...);
$line = 129;
// read (and ignore) the first 128 lines in the file
$i = 1;
while ($i < $line) {
fgets($fp);
$i++;
}
// at last: this is the line we wanted
$theLine = fgets($fp);
(not tested!)

PHP to delete lines within text file beginning with 0 or a negative number

Thank you for taking the time to read this and I will appreciate every single response no mater the quality of content. :)
Using php, I'm trying to create a script which will delete several lines within a text file (.txt) if required, based upon whether the line starts with a 0 or a negative number. Each line within the file will always start with a number, and I need to erase all the neutral and/or negative numbers.
The main part I'm struggling with is that the content within the text file isn't static (e.g. contain x number of lines/words etc.) Infact, it is automatically updated every 5 minutes with several lines. Therefore, I'd like all the lines containing a neutral or negative number to be removed.
The text file follows the structure:
-29 aullah1
0 name
4 username
4 user
6 player
If possible, I'd like Line 1 and 2 removed, since it begins with a neutral/negative number. At points, there maybe times when there are more than two neutral/negative numbers.
All assistance is appreciated and I look forward to your replies; thank you. :) If I didn't explain anything clearly and/or you'd like me to explain in more detail, please reply. :)
Thank you.
Example:
$file = file("mytextfile.txt");
$newLines = array();
foreach ($file as $line)
if (preg_match("/^(-\d+|0)/", $line) === 0)
$newLines[] = chop($line);
$newFile = implode("\n", $newLines);
file_put_contents("mytextfile.txt", $newFile);
It is important that you chop() the newline character off of the end of the line so you don't end up with empty space. Tested successfully.
Something on these lines i guess, it is untested.
$newContent = "";
$lines = explode("\n" , $content);
foreach($lines as $line){
$fChar = substr($line , 0 , 1);
if($fChar == "0" || $fChar == "-") continue;
else $newContent .= $line."\n";
}
If the file is big, its better to read it line by line as:
$fh_r = fopen("input.txt", "r"); // open file to read.
$fh_w = fopen("output.txt", "w"); // open file to write.
while (!feof($fh_r)) { // loop till lines are left in the input file.
$buffer = fgets($fh_r); // read input file line by line.
// if line begins with num other than 0 or -ve num write it.
if(!preg_match('/^(0|-\d+)\b/',$buffer)) {
fwrite($fh_w,$buffer);
}
}
fclose($fh_r);
fclose($fh_w);
Note: Err checking not included.
file_put_contents($newfile,
implode(
preg_grep('~^[1-9]~',
file($oldfile))));
php is not particularly elegant, but still...
Load whole line into variable trim it and then check if first letter is - or 0.
$newContent = "";
$lines = explode("\n" , $content);
foreach($lines as $line){
$fChar = $line[0];
if(!($fChar == '0' || $fChar == '-'))
$newContent .= $line."\n";
}
I changed malik's code for better performance and quality.
Here's another way:
class FileCleaner extends FilterIterator
{
public function __construct($srcFile)
{
parent::__construct(new ArrayIterator(file($srcFile)));
}
public function accept()
{
list($num) = explode(' ', parent::current(), 2);
return ($num > 0);
}
public function write($file)
{
file_put_contents($file, implode('', iterator_to_array($this)));
}
}
Usage:
$filtered = new FileCleaner($src_file);
$filtered->write($new_file);
Logic and methods can be added to the class for other stuff, such as sorting, finding the highest number, converting to a sane storage method such as csv, etc. And, of course, error checking.

Efficiently counting the number of lines of a text file. (200mb+)

I have just found out that my script gives me a fatal error:
Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 440 bytes) in C:\process_txt.php on line 109
That line is this:
$lines = count(file($path)) - 1;
So I think it is having difficulty loading the file into memeory and counting the number of lines, is there a more efficient way I can do this without having memory issues?
The text files that I need to count the number of lines for range from 2MB to 500MB. Maybe a Gig sometimes.
Thanks all for any help.
This will use less memory, since it doesn't load the whole file into memory:
$file="largefile.txt";
$linecount = 0;
$handle = fopen($file, "r");
while(!feof($handle)){
$line = fgets($handle);
$linecount++;
}
fclose($handle);
echo $linecount;
fgets loads a single line into memory (if the second argument $length is omitted it will keep reading from the stream until it reaches the end of the line, which is what we want). This is still unlikely to be as quick as using something other than PHP, if you care about wall time as well as memory usage.
The only danger with this is if any lines are particularly long (what if you encounter a 2GB file without line breaks?). In which case you're better off doing slurping it in in chunks, and counting end-of-line characters:
$file="largefile.txt";
$linecount = 0;
$handle = fopen($file, "r");
while(!feof($handle)){
$line = fgets($handle, 4096);
$linecount = $linecount + substr_count($line, PHP_EOL);
}
fclose($handle);
echo $linecount;
Using a loop of fgets() calls is fine solution and the most straightforward to write, however:
even though internally the file is read using a buffer of 8192 bytes, your code still has to call that function for each line.
it's technically possible that a single line may be bigger than the available memory if you're reading a binary file.
This code reads a file in chunks of 8kB each and then counts the number of newlines within that chunk.
function getLines($file)
{
$f = fopen($file, 'rb');
$lines = 0;
while (!feof($f)) {
$lines += substr_count(fread($f, 8192), "\n");
}
fclose($f);
return $lines;
}
If the average length of each line is at most 4kB, you will already start saving on function calls, and those can add up when you process big files.
Benchmark
I ran a test with a 1GB file; here are the results:
+-------------+------------------+---------+
| This answer | Dominic's answer | wc -l |
+------------+-------------+------------------+---------+
| Lines | 3550388 | 3550389 | 3550388 |
+------------+-------------+------------------+---------+
| Runtime | 1.055 | 4.297 | 0.587 |
+------------+-------------+------------------+---------+
Time is measured in seconds real time, see here what real means
True line count
While the above works well and returns the same results as wc -l, if the file ends without a newline, the line number will be off by one; if you care about this particular scenario, you can make it more accurate by using this logic:
function getLines($file)
{
$f = fopen($file, 'rb');
$lines = 0; $buffer = '';
while (!feof($f)) {
$buffer = fread($f, 8192);
$lines += substr_count($buffer, "\n");
}
fclose($f);
if (strlen($buffer) > 0 && $buffer[-1] != "\n") {
++$lines;
}
return $lines;
}
Simple Oriented Object solution
$file = new \SplFileObject('file.extension');
while($file->valid()) $file->fgets();
var_dump($file->key());
#Update
Another way to make this is with PHP_INT_MAX in SplFileObject::seek method.
$file = new \SplFileObject('file.extension', 'r');
$file->seek(PHP_INT_MAX);
echo $file->key();
If you're running this on a Linux/Unix host, the easiest solution would be to use exec() or similar to run the command wc -l $path. Just make sure you've sanitized $path first to be sure that it isn't something like "/path/to/file ; rm -rf /".
There is a faster way I found that does not require looping through the entire file
only on *nix systems, there might be a similar way on windows ...
$file = '/path/to/your.file';
//Get number of lines
$totalLines = intval(exec("wc -l '$file'"));
If you're using PHP 5.5 you can use a generator. This will NOT work in any version of PHP before 5.5 though. From php.net:
"Generators provide an easy way to implement simple iterators without the overhead or complexity of implementing a class that implements the Iterator interface."
// This function implements a generator to load individual lines of a large file
function getLines($file) {
$f = fopen($file, 'r');
// read each line of the file without loading the whole file to memory
while ($line = fgets($f)) {
yield $line;
}
}
// Since generators implement simple iterators, I can quickly count the number
// of lines using the iterator_count() function.
$file = '/path/to/file.txt';
$lineCount = iterator_count(getLines($file)); // the number of lines in the file
If you're under linux you can simply do:
number_of_lines = intval(trim(shell_exec("wc -l ".$file_name." | awk '{print $1}'")));
You just have to find the right command if you're using another OS
Regards
This is an addition to Wallace Maxter's solution
It also skips empty lines while counting:
function getLines($file)
{
$file = new \SplFileObject($file, 'r');
$file->setFlags(SplFileObject::READ_AHEAD | SplFileObject::SKIP_EMPTY |
SplFileObject::DROP_NEW_LINE);
$file->seek(PHP_INT_MAX);
return $file->key() + 1;
}
The most succinct cross-platform solution that only buffers one line at a time.
$file = new \SplFileObject(__FILE__);
$file->setFlags($file::READ_AHEAD);
$lines = iterator_count($file);
Unfortunately, we have to set the READ_AHEAD flag otherwise iterator_count blocks indefinitely. Otherwise, this would be a one-liner.
private static function lineCount($file) {
$linecount = 0;
$handle = fopen($file, "r");
while(!feof($handle)){
if (fgets($handle) !== false) {
$linecount++;
}
}
fclose($handle);
return $linecount;
}
I wanted to add a little fix to the function above...
in a specific example where i had a file containing the word 'testing' the function returned 2 as a result. so i needed to add a check if fgets returned false or not :)
have fun :)
Based on dominic Rodger's solution,
here is what I use (it uses wc if available, otherwise fallbacks to dominic Rodger's solution).
class FileTool
{
public static function getNbLines($file)
{
$linecount = 0;
$m = exec('which wc');
if ('' !== $m) {
$cmd = 'wc -l < "' . str_replace('"', '\\"', $file) . '"';
$n = exec($cmd);
return (int)$n + 1;
}
$handle = fopen($file, "r");
while (!feof($handle)) {
$line = fgets($handle);
$linecount++;
}
fclose($handle);
return $linecount;
}
}
https://github.com/lingtalfi/Bat/blob/master/FileTool.php
Counting the number of lines can be done by following codes:
<?php
$fp= fopen("myfile.txt", "r");
$count=0;
while($line = fgetss($fp)) // fgetss() is used to get a line from a file ignoring html tags
$count++;
echo "Total number of lines are ".$count;
fclose($fp);
?>
You have several options. The first is to increase the availble memory allowed, which is probably not the best way to do things given that you state the file can get very large. The other way is to use fgets to read the file line by line and increment a counter, which should not cause any memory issues at all as only the current line is in memory at any one time.
There is another answer that I thought might be a good addition to this list.
If you have perl installed and are able to run things from the shell in PHP:
$lines = exec('perl -pe \'s/\r\n|\n|\r/\n/g\' ' . escapeshellarg('largetextfile.txt') . ' | wc -l');
This should handle most line breaks whether from Unix or Windows created files.
TWO downsides (at least):
1) It is not a great idea to have your script so dependent upon the system its running on ( it may not be safe to assume Perl and wc are available )
2) Just a small mistake in escaping and you have handed over access to a shell on your machine.
As with most things I know (or think I know) about coding, I got this info from somewhere else:
John Reeve Article
public function quickAndDirtyLineCounter()
{
echo "<table>";
$folders = ['C:\wamp\www\qa\abcfolder\',
];
foreach ($folders as $folder) {
$files = scandir($folder);
foreach ($files as $file) {
if($file == '.' || $file == '..' || !file_exists($folder.'\\'.$file)){
continue;
}
$handle = fopen($folder.'/'.$file, "r");
$linecount = 0;
while(!feof($handle)){
if(is_bool($handle)){break;}
$line = fgets($handle);
$linecount++;
}
fclose($handle);
echo "<tr><td>" . $folder . "</td><td>" . $file . "</td><td>" . $linecount . "</td></tr>";
}
}
echo "</table>";
}
I use this method for purely counting how many lines in a file. What is the downside of doing this verses the other answers. I'm seeing many lines as opposed to my two line solution. I'm guessing there's a reason nobody does this.
$lines = count(file('your.file'));
echo $lines;
this is a bit late but...
Here is my solution for a text log file I have which uses \n to separate each line.
$data = file_get_contents("myfile.txt");
$numlines = strlen($data) - strlen(str_replace("\n","",$data));
It does load the file into memory but doesn't need to cycle through an unknown number of lines. It may be unsuitable if the file is GB in size but for smaller files with short lines of data it works a treat for me.
It just removes the "\n" from the file and compares how many have been removed by comparing the length of the data in the file to the length after removing all the line breaks ("\n" chars n my case). If your line delineator is a different char, replace the "\n" with whatever is your line delineation character.
I know it is not the best answer for all occasions but is something I have found quick and simple for my purposes where each line of the log is only a few hundred chars and total log file is not too large.
For just counting the lines use:
$handle = fopen("file","r");
static $b = 0;
while($a = fgets($handle)) {
$b++;
}
echo $b;

Capture Progress From Command Line

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 12.4M 100 12.4M 0 0 4489k 0 0:00:02 0:00:02 --:--:-- 4653k
The above is a CURL output from the command line when download the file. I have captured this using PHP like so, but I am having trouble working out how to use pre_match to extract the percentage done.
$handle = popen('curl -o '.VIDEOPATH.$fileName.'.flv '.$url, 'rb');
while(!feof($handle))
{
$progress = fread($handle, 8192);
//I don't even know what I was attempting here
$pattern = '/(?<Total>[0-9]{1,3}\.[0-9]{1,2})% of (?<Total>.+) at/';
//divide received by total somehow, then times 100
if(preg_match_all($pattern, $progress, $matches)){
fwrite($fh, $matches[0][0]."\r\n");
}
}
How can I do this? Please note, I have no idea what I am doing with the above preg_match_all!
Thanks
Update
Thanks to the help of ylebre. I have this so far.
$handle = popen('curl -o '.VIDEOPATH.$fileName.'.flv '.$url.' 2>&1', 'rb');//make sure its saved to videos
while(!feof($handle))
{
$line = fgets($handle, 4096); // Get a line from the input handle
echo '<br>Line'.$line.'<br>';
$line = preg_replace("/s+/", " ", $line); // replace the double spaces with one
$fields = explode(" ", $line); // split the input on spaces into fields array
echo '<br>Fields: '.$fields[0];
fwrite($fh, $fields[0]); // write a part of the fields array to the output file
}
I get this output to the browser:
Line % Total % Received % Xferd Average Speed Time Time Time Current
Fields:
Line Dload Upload Total Spent Left Speed
Fields:
Line 0 1340k 0 4014 0 0 27342 0 0:00:50 --:--:-- 0:00:50 27342 41 1340k 41 552k 0 0 849k 0 0:00:01 --:--:-- 0:00:01 1088k 100 1340k 100 1340k 0 0 1445k 0 --:--:-- --:--:-- --:--:-- 1711k
Fields:
Line
How do I extract the percentage part only? Maybe CURL can do this by itself - hmm will ask a question on this.
The progress that is showing up is probably updating the information in the same spot, so it will help if you know what you are parsing exactly.
The next step I recommend is taking one line of input, and trying to get the regexp to work on that.
You could also just split the string at the spaces if I'm reading the output correctly. If you start out by replacing all the double spaces into one. After that you can use explode() to get an array with the values, which you can print_r to take a peek what is inside.
This would be something like:
$line = fgets($handle, 4096); // Get a line from the input handle
$line = preg_replace("/s+/", " ", $line); // replace the double spaces with one
$fields = explode(" ", $line); // split the input on spaces into fields array
fwrite($fh, $fields[0]); // write a part of the fields array to the output file
As long as the ordering in the fields remains the same, your resulting array should give you a consistent result.
Hope this helps!
If you have access to PHP 5.3, you can use the CURL_PROGRESSFUNCTION option, which results in a much more elegant solution (no parsing output). Here's an example of how to use it:
function callback($download_size, $downloaded, $upload_size, $uploaded)
{
$percent=$downloaded/$download_size;
// Do something with $percent
}
$ch = curl_init('http://www.example.com');
// Turn off the default progress function
curl_setopt($ch, CURLOPT_NOPROGRESS, false);
// Set up the callback
curl_setopt($ch, CURLOPT_PROGRESSFUNCTION, 'callback');
// You'll want to tweak the buffer size. Too small could affect performance. Too large and you don't get many progress callbacks.
curl_setopt($ch, CURLOPT_BUFFERSIZE, 128);
$data = curl_exec($ch);

Categories