Extracting data from text log file advice

Extracting data from text log file advice - php

I would like an advice on best approach on this task.
I have a text log file holding data from a gps, this is the format:
time,lat,lon,elevation,accuracy,bearing,speed
2014-07-08T12:56:52Z,56.187344,10.192660,116.400024,5.000000,285.000000,1.063350
2014-07-08T12:56:58Z,56.187299,10.192754,113.799988,5.000000,161.000000,3.753000
2014-07-08T12:57:07Z,56.186922,10.193048,129.200012,5.000000,159.000000,5.254200
2014-07-08T12:57:13Z,56.186694,10.193133,109.799988,5.000000,152.000000,3.878100
2014-07-08T12:57:16Z,56.186745,10.193304,142.900024,5.000000,149.000000,3.940650
2014-07-08T12:57:20Z,56.186448,10.193417,118.700012,5.000000,154.000000,2.376900
2014-07-08T12:57:27Z,56.186492,10.193820,131.299988,5.000000,65.000000,5.379300
I need to find the line where the speed exceeds a certain value, then get the time from that line, then scroll trough the lines and find the line where the speed is below this value, get the time and write these 2 time values into my database.
This has to be an automated task, so I assume that a cron PHP script could do the job.
Best regards Thomas

Despite the fact that there is no special advice needed but wanting someone to code your problem - I will try to put you in the right direction. I've written easy to understand code where you can build on (untested)...
<?php
// Setup.
$pathGpsFile = 'gps.log';
$speedThreshold = 5;
//
// Execution.
//
if(!file_exists($pathGpsFile)) {
die('File "'. $pathGpsFile .'" does not exist.');
}
// Read entries into array.
$gpsEntries = file($pathGpsFile, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
// Loop through entries.
$lineCount = 0;
$currentDifferences = array();
$currentDifference = array();
foreach($gpsEntries as $gpsEntry) {
// Skip head.
if($lineCount == 0) {
$lineCount++;
continue;
}
// Extract values from gps entry.
list($time, $lat, $lon, $elevation, $accuracy, $bearing, $speed) = explode(',', $gpsEntry);
// Check if there is currently a difference monitored.
if(count($currentDifference) == 1) {
if($speed < $speedThreshold) {
$currentDifference[] = $gpsEntry;
}
// Add to differences list.
$currentDifferences[] = $currentDifference;
// Reset current difference.
$currentDifference = array();
} else {
if($speed > $speedThreshold) {
$currentDifference[] = $gpsEntry;
}
}
// Increase line count.
$lineCount++;
}
// Check output.
var_dump($currentDifferences);
?>

Related

PHP Array Processing Ability Decreases

I need help processing files holding about 46k lines or more than 30MB of data.
My original idea was to open the file and turn each line into an array element. This worked the first time as the array held about 32k values total.
The second time, the process was repeated, the array only held 1011 elements, and finally, the third time it could only hold 100.
I'm confused and don't know much about the backend array processes. Can someone explain what is happening and fix the code?
function file_to_array($cvsFile){
$handle = fopen($cvsFile, "r");
$path = fread($handle, filesize($cvsFile));
fclose($handle);
//Turn the file into an array and separate lines to elements
$csv = explode(",", $path);
//Remove common double spaces
foreach ($csv as $key => $line){
$csv[$key] = str_replace(' ', '', str_getcsv($line));
}
array_filter($csv);
//get the row count for the file and array
$rows = count($csv);
$filerows = count(file($cvsFile)); //this no longer works
echo "File has $filerows and array has $rows";
return $csv;
}

The approach here can be split in 2.
Optimized file reading and processing
Proper storage solution
Optimized file processing can be done like so:
$handle = fopen($cvsFile, "r");
$rowsSucceed = 0;
$rowsFailed = 0;
if ($handle) {
while (($line = fgets($handle)) !== false) { // Reading file by line
// Process CSV line and check if it was parsed correctly
// And count as you go
if (!empty($parsedLine)) {
$csv[$key] = ... ;
$rowsSucceed++;
} else {
$rowsFailed++;
}
}
fclose($handle);
} else {
// Error handling
}
$totalLines = $rowsSucceed + $rowsFailed;
Also you can avoid array_filter() simply by not adding processed line if its empty.
It will allow to optimize memory usage during script execution.
Proper storage
Proper storage here is needed for performing operations on certain amount of data. File reading are ineffective and expensive. Using simple file based database like sqlite can help you a lot and increase overall performance of your script.
For this purpose you probably should process your CSV directly to database and than perform count operation on parsed data avoiding excessive file line counts etc.
Also it gives you further advantage on working with data not keeping it all in memory.

Your question says you want to "turn each line into an array element" but that is definitely not what you are doing. The code is quite clear; it reads the entire file into $path and then uses explode() to make one massive flat array of every element on every line. Then later you're trying to run str_getcsv() on each item, which of course isn't going to work; you've already exploded all the commas away.
Looping over the file using fgetcsv() makes more sense:
function file_to_array($cvsFile) {
$filerows = 0;
$handle = fopen($cvsFile, "r");
while ($line = fgetcsv($handle)) {
$filerows++;
// skip empty lines
if ($line[0] === null) {
continue;
}
//Remove common double spaces
$csv[] = str_replace(' ', '', $line);
}
//get the row count for the file and array
$rows = count($csv);
echo "File has $filerows and array has $rows";
fclose($handle);
return $csv;
}

php - for loop repeating itself / going out of sequence

I'm very new to PHP, making errors and learning as I go. Please be gentle! :)
I want to access some data from Blizzard.com's API. For this particular data set, it's not a block of data in JSON, rather each object has it's own URL to access. I estimate that there are approx 150000 objects, however I don't know the start or end points of the number range. So I'm having to assume 1 and work past the highest number I know (269065)
To get the data, I need to access each object's data via a JSON file, which I read, get the contents of & drop in to a text file (this could be written as an insert in to a SQL db too, as I'm able to do this if it's the text file that's the issue). But to be honest, I would love to get to the bottom of why this is happening as much as anything!
I wasn't going to try and run ~250000 iterations in a for loop, I thought I'd try something I considered small, 2000.
The for loop starts with $a as 1, uses $a as part of the URL, loads & decodes the JSON, checks to see if the first field (ID) in the object is set, if it is, it writes a few fields to data.txt & if the first field (ID) isn't set it just writes $a to data.txt (so I know it's a null for other purposes not outlined here).
Simple! Or so I thought, after approx after 183 iterations, the data written to the text file goes awry as seen by the quote below. It is out of sequence and starts at 1 again, then back to 184 ad nauseam. The loop then seems to be locked in some kind of infinite loop of running, outputting in a random order until I close the page 10-20 minutes later.
I have obviously made a big mistake! But I have no idea what I have done wrong to have caused this. During my attempts I have rewritten the code with new variable names, so a new text does not conflict with code that could be running in memory.
I've tried resetting variables to blank at the end of the loop in case it something was being reused that was causing a problem.
If anyone could point out any errors in my code, or suggest something for me to look in to, to handle bigger loops that would be brilliant. I am assuming my issue may be a time out or memory problem. But I don't know where to start & was hoping I'd find some suggestions here.
If it's relevant, I am using 000webhostapp.com as my host provider for now, until I get some paid for hosting.
1 ... 182 183 1 184 2 3 185 4 186 5 187 6 188 7 189 190 8 191
for ($a = 1; $a <= 2000; $a++) {
$json = "https://eu.api.battle.net/wow/recipe/".$a."?locale=en_GB&<MYPRIVATEAPIKEY>";
$contents = file_get_contents($json);
$data = json_decode($contents,true);
if (isset($data['id'])) {
$file = fopen("data.txt","a");
fwrite($file,$data['id'].",'".$data['name']."'\n");
fclose($file);
} else {
$file = fopen("data.txt","a");
fwrite($file,$a."\n");
fclose($file);
}
}
The content of the file I'm trying to access is
{"id":33994,"name":"Precise Strikes","profession":"Enchanting","icon":"spell_holy_greaterheal"}
I scrapped the original plan and wrote this instead. Thank you again who took the time out of their day to help and offer suggestions!
$b = $mysqli->query("SELECT id FROM `static_recipes` order by id desc LIMIT 1;")->fetch_object()->id;
if (empty($b)) {$b=1;};
$count = $b+101;
$write = [];
for ($a = $b+1; $a < $count; $a++) {
$json = "https://eu.api.battle.net/wow/recipe/".$a."?locale=en_GB&apikey=";
$contents = #file_get_contents($json);
$data = json_decode($contents,true);
if (isset($data['id'])) {
$write [] = "(".$data['id'].",'".addslashes($data['name'])."','".addslashes($data['profession'])."','".addslashes($data['icon'])."')";
} else {
$write [] = "(".$a.",'a','a','a'".")";
}
}
$SQL = ('INSERT INTO `static_recipes` (id, name, profession, icon) VALUES '.implode(',', $write));
$mysqli->query($SQL);
$mysqli->close();

$write = [];
for ($a = 1; $a <= 2000; $a++) {
$json = "https://eu.api.battle.net/wow/".$a."?locale=en_GB&<MYPRIVATEAPIKEY>";
$contents = file_get_contents($json);
$data = json_decode($contents,true);
if (isset($data['id'])) {
$write [] = $data['id'].",'".$data['name']."'\n";
} else {
$write [] = $a."\n";
}
}
$file = fopen("data.txt","a");
fwrite($file, implode('', $write));
fclose($file);
Also, why you are think what some IDS isn't duplicated at several "https://eu.api.battle.net/wow/[N]" urls data?
Also if you are I wasn't going to try and run ~250000 think about curl_multi_init(): http://php.net/manual/en/function.curl-multi-init.php

I can't really see anything obviously wrong with your code, can't run it though as I don't have the JSON
It could be possible that there is some kind of race condition since you're opening and closing the same file hundreds of times very quickly.
File operations might seem atomic but not necessarily so - here's an interesting SO thread:
Does PHP wait for filesystem operations (like file_put_contents) to complete before moving on?
Like some others' suggested - maybe just open the file before you enter the loop then close the file when the loop breaks.
I'd try it first and see if it helps.

There's nothing in your original code that would cause that sort of behaviour. PHP will not arbitrarily change the value of a variable. You are opening this file in append mode, are you certain that you're not looking at old data? Maybe output some debug messages as you process the data. It's likely you'd run up against some rate limiting on the API server, so putting a pause in there somewhere may improve reliability.
The only substantive change I'd suggest to your code is opening the file once and closing it when you're done.
$file = fopen("data_1_2000.txt", "w");
for ($a = 1; $a <= 2000; $a++) {
$json = "https://eu.api.battle.net/wow/recipe/$a?locale=en_GB&<MYPRIVATEAPIKEY>";
$contents = file_get_contents($json);
$data = json_decode($contents, true);
if (!empty($data['id'])) {
$data["name"] = str_replace("'", "\\'", $data["name"]);
$record = "$data[id],'$data[name]'";
} else {
$record = $a;
}
fwrite($file, "$record\n");
sleep(1);
echo "$a "; if ($a % 50 === 0) echo "\n";
}
fclose($file);

Xampp ERR_CONNECTION_RESET for 1 file

Currently have a file that is set to read a CSV file. The CSV file contains 1600 api queries. Then each api query then returns more queries that need to be run. I am using Xampp v3.2.2 on windows 10 and running php v5.6.15. When running the file through my browser it ran fine for the first 800+ records in the CSV before timing out. When I rerun the file now I get an error "Site can't be reach ERR_CONNECTION_RESET". Not sure what could be causing this. abbreviated version of the code is included below
<?php
error_reporting(E_ERROR | E_PARSE);
set_time_limit (28800);
$csv = array_map('str_getcsv', file('file.csv'));
for($i = 0; $i < count($csv); $i++){
if($csv[$i][2] == 1){ //this item in csv is flag to check if this item has been run yet
if($csv[$i][3] != 'NULL' && trim($csv[$i][3]) != ''){ // check to make sure there is a URL
$return = file_get_contents(trim($csv[$i][3])); //Get the contents that links to new api calls
if($return){
$isCall = array(); // array to store all new calls
$data = array(); // array to store all data to put in csv
$doc = new DOMDocument('1.0'); // create new DOM object
$doc->loadHTML($return); // load page string into DOM object
$links = $doc->getElementsByTagName('a'); // get all <a> tags on the page
if($links->length > 0){ // if there is at least one <a> tag on page
for($j = 0; $j < $links->length; $j++){ // loop through <a> tags
$isCall[]= $links->item($j)->getAttribute('href'); // get href attribute from <a> tag and push into array
}
for($x = 0; $x < count($isCall); $x++){ // loop through all the calls and search for data
$string = file_get_contents($isCall[$x]);
if($string) {
$thispage = new DOMDocument('1.0');
$thispage->loadHTML($string);
$pagedata = $thispage->getElementsByTagName('div');
if ($pagedata->length > 0) {
for($j = 0; $j < $pagedata->length; $j++) {
$data[] = $pagedata->item($j)->C14N();
}
}
}
if(count($data) >= 5) break; // limiting to 5 data points to be added to csv
}
}
if(!empty($data)) $csv[$i] = array_merge($csv[$i], $data); // if we have data points lets add them to the row in the csv
}
}
$csv[$i][2] = 2; // set the flag to 2
$fp = fopen('file.csv', 'w'); // write the contents to the csv each time through loops so if it fails we start at last completed record
foreach ($csv as $f) {
fputcsv($fp, $f);
}
fclose($fp);
}
}
?>

Usually ERR_CONNECTION_RESET is an error that occurs when the site you are trying to connect to is unable to establish that connection. This usually happens due to reasons like firewall blocking on issues with ISP cache etc.
However in your case, I feel that the site you are connecting to is voluntarily closing connection attempts because what you are trying to do is loop over and hit that API site 1600 times continuously.
The API site is allowing the first 800-odd attempts but after that it gets worried that you are perhaps a malicious script trying to harm it. Like a classical example of DOS (Denial Of Service) attempt.
You should check if there is any restriction to the number of attempts a client can make to the API site with a fixed time (like say 500 hits every 24 hours) or you should try to sleep N seconds after each hit to the site or after each X number of hits to the site.

PHP Array sorting within WHILE loop

I have a huge issue, I cant find any way to sort array entries. My code:
<?php
error_reporting(0);
$lines=array();
$fp=fopen('file.txt, 'r');
$i=0;
while (!feof($fp))
{
$line=fgets($fp);
$line=trim($line);
$lines[]=$line;
$oneline = explode("|", $line);
if($i>30){
$fz=fopen('users.txt', 'r');
while (!feof($fz))
{
$linez=fgets($fz);
$linez=trim($linez);
$lineza[]=$linez;
$onematch = explode(",", $linez);
if (strpos($oneline[1], $onematch[1])){
echo $onematch[0],$oneline[4],'<br>';
}
else{
}
rewind($onematch);
}
}
$i++;
}
fclose($fp);
?>
The thing is, I want to sort items that are being echo'ed by $oneline[4]. I tried several other posts from stackoverflow - But was not been able to find a solution.

The anser to your question is that in order to sort $oneline[4], which seems to contain a string value, you need to apply the following steps:
split the string into an array ($oneline[4] = explode(',',
$oneline[4]))
sort the resulting array (sort($oneline[4]))
combine the array into a string ($oneline[4] = implode(',',
$oneline[4]))
As I got the impression variable naming is low on the list of priorities I'm re-using the $oneline[4] variable. Mostly to clarify which part of the code I am referring to.
That being said, there are other improvements you should be making, if you want to be on speaking terms with your future self (in case you need to work on this code in a couple of months)
Choose a single coding style and stick to it, the original code looked like it was copy/pasted from at least 4 different sources (mostly inconsistent quote-marks and curly braces)
Try to limit repeating costly operations, such as opening files whenever you can (to be fair, the agents.data could contain 31 lines and the users.txt would be opened only once resulting in me looking like a fool)
I have updated your code sample to try to show what I mean by the points above.
<?php
error_reporting(0);
$lines = array();
$users = false;
$fp = fopen('http://20.19.202.221/exports/agents.data', 'r');
while ($fp && !feof($fp)) {
$line = trim(fgets($fp));
$lines[] = $line;
$oneline = explode('|', $line);
// if we have $users (starts as false, is turned into an array
// inside this if-block) or if we have collected 30 or more
// lines (this condition is only checked while $users = false)
if ($users || count($lines) > 30) {
// your code sample implies the users.txt to be small enough
// to process several times consider using some form of
// caching like this
if (!$users) {
// always initialize what you intend to use
$users = [];
$fz = fopen('users.txt', 'r');
while ($fz && !feof($fz)) {
$users[] = explode(',', trim(fgets($fz)));
}
// always close whatever you open.
fclose($fz);
}
// walk through $users, which contains the exploded contents
// of each line in users.txt
foreach ($users as $onematch) {
if (strpos($oneline[1], $onematch[1])) {
// now, the actual question: how to sort $oneline[4]
// as the requested example was not available at the
// time of writing, I assume
// it to be a string like: 'b,d,c,a'
// first, explode it into an array
$oneline[4] = explode(',', $oneline[4]);
// now sort it using the sort function of your liking
sort($oneline[4]);
// and implode the sorted array back into a string
$oneline[4] = implode(',', $oneline[4]);
echo $onematch[0], $oneline[4], '<br>';
}
}
}
}
fclose($fp);
I hope this doesn't offend you too much, just trying to help and not just providing the solution to the question at hand.

Using fseek to start reading a CSV after a certain number of lines

I am using the current code to read a csv file and add it to an array:
echo "starting CSV import<br>";
$current_row = 1;
$handle = fopen($csv, "r");
while ( ($data = fgetcsv($handle, 10000, ",") ) !== FALSE )
{
$number_of_fields = count($data);
if ($current_row == 1) {
//Header line
for ($c=0; $c < $number_of_fields; $c++)
{
$header_array[$c] = $data[$c];
}
} else {
//Data line
for ($c=0; $c < $number_of_fields; $c++)
{
$data_array[$header_array[$c]] = $data[$c];
}
array_push($products, $data_array);
}
$current_row++;
}
fclose($handle);
echo "finished CSV import <br>";
However when using a very large CSV this times out on the server, or has a memory limit error.
I'd like a way to do it in stages, so after the first say 100 lines it will refresh the page, starting at line 101.
I will probably be doing this with a meta refresh and a URL parameter.
I just need to know how to adapt that code above to start at the line I tell it to.
I have looked into fseek() but I'm not sure how to implement this here.
Can you please help?

The timout can be circumvented using
ignore_user_abort(true);
set_time_limit(0);
When experiencing problems with the memory limit, it may be wise to take a step back and look at what you're actually doing with the data you're processing. Are you pushing the data into a database? calculate something off the data but don't need to store the actual data, …
Do you really need to push (array_push($products, $data_array);) the rows into an array (for later processing)? can you instead write to the database directly? or calculate directly? or build an html <table> directly? or whatever the hell you're doing right then an there, within the while() loop, without pushing everything into an array first?
If you're able to chunk the processing, I guess you don't need that array at all. Otherwise you'd have to restore the array for every chunk - not solving the memory issue one bit.
If you can manage to change your processing algorithm to waste less memory / time, you should seriously consider that over any chunked processing requiring a round-trip to the browser (for so many performance and security reasons…).
Anyways, you can, at any time, identify the current stream offset with ftell() and re-set to that position using fseek(). You'd only need to pass that integer to your next iteration.
Also there is no need for your inner for() loops. This should produce the same results:
<?php
$products = array();
$cols = null;
$first = true;
$handle = fopen($csv, "r");
while (($data = fgetcsv($handle, 10000, ",")) !== false) {
if ($first) {
$cols = $data;
$first = false;
} else {
$products[] = array_combine($cols, $data);
}
}
fclose($handle);
echo "finished CSV import <br>";

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Extracting data from text log file advice - php

Related

PHP Array Processing Ability Decreases

php - for loop repeating itself / going out of sequence

Xampp ERR_CONNECTION_RESET for 1 file

PHP Array sorting within WHILE loop

Using fseek to start reading a CSV after a certain number of lines

Categories

Resources