I have log file (max size 50Mb) in which every second written user GET request with parameters (nginx make it).
I have cron php script, which starts every minute, should read next part of log file, calculate data and insert data in statistic mysql database.
What is the best way to read log file part by part every minute?
You can read log file on regular way, something like this:
$myFile = "log.txt";
$lines = file($myFile);
$last_line = 0;
#$last_line = file_get_contents('log-last-line.txt');
$i = 0;
foreach($lines as $line){
$i++;
if($i >= $last_line){
//this lines are new, save them in database
}
}
file_put_contents('log-last-line.txt', count($lines));
You can write to "log-last-line.txt" last line from log.
Related
I need to remove various useless log rows from a huge log file (200 MB)
/usr/local/cpanel/logs/error_log
The useless log rows are in array $useless
The way I am doing is
$working_log="/usr/local/cpanel/logs/error_log";
foreach($useless as $row)
{
if ($row!="") {
file_put_contents($working_log,
str_replace("$row","", file_get_contents($working_log)));
}
}
I need to remove about 65000 rows from the log file;
the code above does the job but it works slow, about 0.041 sec to remove each row.
Do you know a faster way to do this job using php ?
If the file can be loaded in memory twice (it seems it can if your code works) then you can remove all the strings from $useless in a single str_replace() call.
The documentation of str_replace() function explains how:
If search is an array and replace is a string, then this replacement string is used for every value of search.
$working_log="/usr/local/cpanel/logs/error_log";
file_put_contents(
$working_log,
str_replace($useless, '', file_get_contents($working_log))
);
When the file becomes too large to be processed by the code above you have to take a different approach: create a temporary file, read each line from the input file and write it to the temporary file or ignore it. At the end, move the temporary file over the source file:
$working_log="/usr/local/cpanel/logs/error_log";
$tempfile = "/usr/local/cpanel/logs/error_log.new";
$fin = fopen($working_log, "r");
$fout = fopen($tempfile, "w");
while (! feof($fin)) {
$line = fgets($fin);
if (! in_array($line, $useless)) {
fputs($fout, $line);
}
}
fclose($fin);
fclose($fout);
// Move the current log out of the way (keep it as backup)
rename($working_log, $working_log.".bak");
// Put the new file instead.
rename($tempfile, $working_log);
You have to add error handling (fopen(), fputs() may fail for various reasons) and code or human intervention to remove the backup file.
I'm trying to export a lot of data trough a CSV export. The amount of data it's really big, around 100.000 records and counting.
My client usually uses two tabs to browse and check several stuff at the same time. So a requirement is that while the export is being made, he can continues browsing the system.
The issue is that when the CSV is being generated on the server, the session is blocked, you cannot load another page until the generation is completed.
This is what I'm doing:
Open the file
Loop trough the amount of data(One query per cycle, each cycle queries 5000 records) pd: I cannot change this, because of certain limitations.
write the data into the file
free memory
close the file
set headers to begin download
During the entire process, it's not possible to navigate the site in another tab.
The block of code:
$temp = 1;
$first = true;
$fileName = 'csv_data_' . date("Y-m-d") . '-' . time() . '.csv';
$filePath = CSV_EXPORT_PATH . $fileName;
// create CSV file
$fp = fopen($filePath, 'a');
// get data
for ($i = 1; $i <= $temp; $i++) {
// get lines
$data = $oPB->getData(ROWS_PER_CYCLE, $i); // ROWS_PER_CYCLE = 5000
// if something is empty, exit
if (empty($data)) {
break;
}
// write the data that will be exported into a file
fwrite($fp, $export->arrayToCsv($data, '', '', $first));
// count element
$temp = ceil($data[0]->foundRows / ROWS_PER_CYCLE); // foundRows is always the same value, doesn't change per query.
$first = false; // hide header for next rows
// free memory
unset($lines);
}
// close file
fclose($fp);
/**
* Begin Download
*/
$export->csvDownload($filePath); // set headers
Some considerations:
The count is being made in the same query, but it's not entering into an infinite loop, works as expected. It's contained into $data[0]->foundRows, and avoids an unnecesary query to count all the available records.
There're several memory limitations due to environment settings, that I cannot change.
Does anyone know How can I improve this? Or any other solution.
Thanks for reading.
I'm replying only because it can be helpful to someone else. A colleague came up with a solution for this problem.
Call the function session_write_close() before
$temp = 1;
Doing this, you're ending the current session and storing the session data, so I'm being able to download the file a continue navigating in other tabs.
I hope it helps some one.
Some considerations about this solution:
You must no require to use session data after session_write_close()
The export script is in another file. For ex: home.php calls trough a link export.php
I'm trying to parse a 50 megabyte .csv file. The file itself is fine, but I'm trying to get past the massive timeout issues involved. Every is set upload wise, I can easily upload and re-open the file but after the browser timeout, I receive a 500 Internal error.
My guess is I can save the file onto the server, open it and keep a session value of what line I dealt with. After a certain line I reset the connect via refresh and open the file at the line I left off with. Is this a do-able idea? The previous developer made a very inefficient MySQL class and it controls the entire site, so I don't want to write my own class if I don't have to, and I don't want to mess with his class.
TL;DR version: Is it efficient to save the last line I'm currently on of a CSV file that has 38K lines of products then, and after X number of rows, reset the connection and start from where I left off? Or is there another way to parse a Large CSV file without timeouts?
NOTE: It's the PHP script execution time. Currently at 38K lines, it takes about 46 minutes and 5 seconds to run via command line. It works correctly 100% of the time when I remove it from the browser, suggesting that it is a browser timeout. Chrome's timeout is not editable as far as Google has told me, and Firefox's timeout works rarely.
You could do something like this:
<?php
namespace database;
class importcsv
{
private $crud;
public function __construct($dbh, $table)
{
$this->crud = new \database\crud($dbh, $table);
return $this;
}
public function import($columnNames, $csv, $seperator)
{
$lines = explode("\n", $csv);
foreach($lines as $line)
{
\set_time_limit(30);
$line = explode($seperator, $line);
$data = new \stdClass();
foreach($line as $i => $item)
{
if(isset($columnNames[$i])&&!empty($columnNames[$i]))
$data->$columnNames[$i] = $item;
}
#$x++;
$this->crud->create($data);
}
return $x;
}
public function importFile($columnNames, $csvPath, $seperator)
{
if(file_exists($csvPath))
{
$content = file_get_contents($csvPath);
return $this->import($columnNames, $content, $seperator);
}
else
{
// Error
}
}
}
TL;DR: \set_time_limit(30); everytime you loop throu a line might fix your timeout issues.
I suggest to run php from command line and set it as a cron job. This way you don't have to modify your code. There will be no timeout issue and you can easily parse large CSV files.
also check this link
Your post is a little unclear due to the typos and grammar, could you please edit?
If you are saying that the Upload itself is okay, but the delay is in processing of the file, then the easiest thing to do is to parse the file in parallel using multiple threads. You can use the java built-in Executor class, or Quartz or Jetlang to do this.
Find the size of the file or number of lines.
Select a Thread load (Say 1000 lines per thread)
Start an Executor
Read the file in a loop.
For ach 1000 lines, create a Runnable and load it to the Executor
Start the Executor
Wait till all threads are finished
Each runnable does this:
Fetch a connection
Insert the 1000 lines
Log the results
Close the connection
Ok so i have a .txt file with a bunch of urls. I got a script that gets 1 of the lines randomly. I then included this into another page.
However I want the url to change every 15 minutes. So I'm guessing I'm gonna need to use a cron, however I'm not sure how I should put it all into place.
I found if you include a file, it's still going to give a random output so I'm guessing if I run the cron and the include file it's going to get messy.
So what I'm thinking is I have a script that randomly selects a url from my initial text file then it saves it to another .txt file and I include that file on the final page.
I just found this which is sort of in the right direction:
Include php code within echo from a random text
I'm not the best with writing php (can understand it perfectly) so all help is appreciated!
So what I'm thinking is I have a
script that randomly selects a url
from my initial text file then it
saves it to another .txt file and I
include that file on the final page.
That's pretty much what I would do.
To re-generate that file, though, you don't necessarily need a cron.
You could use the following idea :
If the file has been modified less that 15 minutes ago (which you can find out using filemtime() and comparing it with time())
then, use what in the file
else
re-generate the file, randomly choosing one URL from the big file
and use the newly generated file
This way, no need for a cron : the first user that arrives more than 15 minutes after the previous modification of the file will re-generate it, with a new URL.
Alright so I sorta solved my own question:
<?php
// load the file that contain thecode
$adfile = "urls.txt";
$ads = array();
// one line per code
$fh = fopen($adfile, "r");
while(!feof($fh)) {
$line = fgets($fh, 10240);
$line = trim($line);
if($line != "") {
$ads[] = $line;
}
}
// randomly pick an code
$num = count($ads);
$idx = rand(0, $num-1);
$f = fopen("output.txt", "w");
fwrite($f, $ads[$idx]);
fclose($f);
?>
However is there anyway I can delete the chosen line once it has been picked?
I am processing a big .gz file using PHP (transfering data from gz to mysql)
it takes about 10 minutes per .gz file.
I have a lot of .gz file to be processed.
After PHP is finished with one file I have to manually change the PHP script to select another .gz file and then run the script again manually.
I want it to be automatically run the next job to process the next file.
the gz file is named as 1, 2 ,3, 4, 5 ...
I can simply make a loop to be something like this ( process file 1 - 5):
for ($i = 1 ; $i >= 5; $i++)
{
$file = gzfile($i.'.gz')
...gz content processing...
}
However, since the gz file is really big, I cannot do that, because if I use this loop, PHP will run multiple big gz files as single script job. (takes a lot of memory)
What I want to do is after PHP is finished with one job I want a new job to process the next file.
maybe its going to be something like this:
$file = gzfile($_GET['filename'].'.gz')
...gz content processing...
Thank You
If you clean up after processing and free all memory using unset(), you could simply wrap the whole script in a foreach (glob(...) as $filename) loop. Like this:
<?php
foreach (glob(...) as $filename) {
// your script code here
unset($thisVar, $thatVar, ...);
}
?>
What you should do is
Schedule a cronjob to run your php script every x minutes
When script is run, check if there is a lock file in place, if not create one and start processing the next unprocessed gz file, if yes abort
Wait for the queue to get cleared
You should call the PHP script with argument, from a shell script. Here's the doc how to use command-line parameters in PHP http://php.net/manual/en/features.commandline.php
Or, I can't try it now, but you may give a chance to unset($file) after processing the gzip.
for ($i = 1 ; $i >= 5; $i++)
{
$file = gzfile($i.'.gz')
...gz content processing...
unset($file);
}