The code I currently have as pasted below simply (or not so simply) reads an existing json file and appends a new record to the file after the '[' character, so skipping one line. The problem is this whole process of reading and writing produces very slow results. This is for a dice game, user roles the dice and the score is written to a json file, a jquery script reads the file in to a table.
function writeJson($player, $result, $bet, $betamount, $profit) {
$id=uniqid();
$response = array(
'player'=> $player,
'result'=> $result,
'bet'=>$bet,
'size'=>$betamount,
'profit'=>$profit,
'id'=>$id
);
$lines = file('dice.json');
$oldlines='';
foreach ($lines as $line_num => $line) {
$oldlines.=$line."\r\n";
if($line_num==0) { //insert new record here.
$oldlines.=json_encode($response).",\r\n";
}
}
file_put_contents('dice.json',preg_replace('/^\h*\v+/m', '', $oldlines));
clearstatcache();
}
here is the format of a json file generated by this code.
[
new records will be inserted at this location
{"player":"seang","result":"EVEN","bet":"ODD","size":"0.00000005","profit":"-0.00000005","id":"57b9ace6ce133"},
{"player":"seang","result":"EVEN","bet":"ODD","size":"0.00000005","profit":"-0.00000005","id":"57b9ace1c73f4"},
{"player":"seang","result":"ODD","bet":"ODD","size":"0.00000005","profit":"+0.00000005","id":"57b9acd8dd50a"}
]
Related
Background
I'm trying to complete a code challenge where I need to refactor a simple PHP application that accepts a JSON file of people, sorts them by registration date, and outputs them to a CSV file. The provided program is already functioning and works fine with a small input but intentionally fails with a large input. In order to complete the challenge, the program should be modified to be able to parse and sort a 100,000 record, 90MB file without running out of memory, like it does now.
In it's current state, the program uses file_get_contents(), followed by json_decode(), and then usort() to sort the items. This works fine with the small sample data file, however not with the large sample data file - it runs out of memory.
The input file
The file is in JSON format and contains 100,000 objects. Each object has a registered attribute (example value 2017-12-25 04:55:33) and this is how the records in the CSV file should be sorted, in ascending order.
My attempted solution
Currently, I've used the halaxa/json-machine package, and I'm able to iterate over each object in the file. For example
$people = \JsonMachine\JsonMachine::fromFile($fileName);
foreach ($people as $person) {
// do something
}
Reading the whole file into memory as a PHP array is not an option, as it takes up too much memory, so the only solution I've been able to come up with so far has been iterating over each object in the file, finding the person with the earliest registration date and printing that. Then, iterating over the whole file again, finding the next person with the earliest registration date and printing that etc.
The big issue with that is that the nested loops: a loop which runs 100,000 times containing a loop that runs 100,000 times. It's not a viable solution, and that's the furthest I've made it.
How can I parse, sort, and print to CSV, a JSON file with 100,000 records? Usage of packages / services is allowed.
I ended up importing into MongoDB in chunks and then retrieving in the correct order to print
Example import:
$collection = (new Client($uri))->collection->people;
$collection->drop();
$people = JsonMachine::fromFile($fileName);
$chunk = [];
$chunkSize = 5000;
$personNumber = 0;
foreach ($people as $person) {
$personNumber += 1;
$chunk[] = $person;
if ($personNumber % $chunkSize == 0) { // Chunk is full
$this->collection->insertMany($chunk);
$chunk = [];
}
}
// The very last chunk was not filled to the max, but we still need to import it
if(count($chunk)) {
$this->collection->insertMany($chunk);
}
// Create an index for quicker sorting
$this->collection->createIndex([ 'registered' => 1 ]);
Example retrieve:
$results = $this->collection->find([],
[
'sort' => ['registered' => 1],
]
);
// For every person...
foreach ($results as $person) {
// For every attribute...
foreach ($person as $key => $value) {
if($key != '_id') { // No need to include the new MongoDB ID
echo some_csv_encode_function($value) . ',';
}
}
echo PHP_EOL;
}
I need help processing files holding about 46k lines or more than 30MB of data.
My original idea was to open the file and turn each line into an array element. This worked the first time as the array held about 32k values total.
The second time, the process was repeated, the array only held 1011 elements, and finally, the third time it could only hold 100.
I'm confused and don't know much about the backend array processes. Can someone explain what is happening and fix the code?
function file_to_array($cvsFile){
$handle = fopen($cvsFile, "r");
$path = fread($handle, filesize($cvsFile));
fclose($handle);
//Turn the file into an array and separate lines to elements
$csv = explode(",", $path);
//Remove common double spaces
foreach ($csv as $key => $line){
$csv[$key] = str_replace(' ', '', str_getcsv($line));
}
array_filter($csv);
//get the row count for the file and array
$rows = count($csv);
$filerows = count(file($cvsFile)); //this no longer works
echo "File has $filerows and array has $rows";
return $csv;
}
The approach here can be split in 2.
Optimized file reading and processing
Proper storage solution
Optimized file processing can be done like so:
$handle = fopen($cvsFile, "r");
$rowsSucceed = 0;
$rowsFailed = 0;
if ($handle) {
while (($line = fgets($handle)) !== false) { // Reading file by line
// Process CSV line and check if it was parsed correctly
// And count as you go
if (!empty($parsedLine)) {
$csv[$key] = ... ;
$rowsSucceed++;
} else {
$rowsFailed++;
}
}
fclose($handle);
} else {
// Error handling
}
$totalLines = $rowsSucceed + $rowsFailed;
Also you can avoid array_filter() simply by not adding processed line if its empty.
It will allow to optimize memory usage during script execution.
Proper storage
Proper storage here is needed for performing operations on certain amount of data. File reading are ineffective and expensive. Using simple file based database like sqlite can help you a lot and increase overall performance of your script.
For this purpose you probably should process your CSV directly to database and than perform count operation on parsed data avoiding excessive file line counts etc.
Also it gives you further advantage on working with data not keeping it all in memory.
Your question says you want to "turn each line into an array element" but that is definitely not what you are doing. The code is quite clear; it reads the entire file into $path and then uses explode() to make one massive flat array of every element on every line. Then later you're trying to run str_getcsv() on each item, which of course isn't going to work; you've already exploded all the commas away.
Looping over the file using fgetcsv() makes more sense:
function file_to_array($cvsFile) {
$filerows = 0;
$handle = fopen($cvsFile, "r");
while ($line = fgetcsv($handle)) {
$filerows++;
// skip empty lines
if ($line[0] === null) {
continue;
}
//Remove common double spaces
$csv[] = str_replace(' ', '', $line);
}
//get the row count for the file and array
$rows = count($csv);
echo "File has $filerows and array has $rows";
fclose($handle);
return $csv;
}
I have a .csv file which I can use with Google maps API to successfully create map data.
What I'm looking to do is merge 2 (or more) .csv files and display the TOTAL data on the Google map in the same way. They are all in the same format.
I have the paths to the 2 csv files and if need be, a blank .csv file in the same directory where the files could be merged to...
Unfortuantely, the .csv files all have an initial 'header row' which would be awesome to omit when merging...
If anyone can point me in the right direction, I'd be very happy. Thanks
edit: I've tried:
$data1 = file_get_contents('google_map_data.csv');
$data2 = file_get_contents('google_map_data2.csv');
$TOTALdata = "google_map_dataALL.csv";
function joinFiles(array $files, $result)
{
if(!is_array($files)) {
throw new Exception('`$files` must be an array');
}
$wH = fopen($result, "w+");
foreach($files as $file)
{
$fh = fopen($file, "r");
while(!feof($fh))
{
fwrite($wH, fgets($fh));
}
fclose($fh);
unset($fh);
fwrite($wH, "\n"); //usually last line doesn't have a newline
}
fclose($wH);
unset($wH);
joinFiles(array($data1, $data2), $TOTALdata);
I'm assuming both files are small, so loading them all in one go should be OK.
The code loads both files then removes the first line off the second one. It also removes any end of line from the first file, but adds it's own to ensure it always has a new line...
$data1 = file_get_contents('google_map_data.csv');
$data2 = file_get_contents('google_map_data2.csv');
$TOTALdata = "google_map_dataALL.csv";
$data2 = ltrim(strstr($data2, PHP_EOL));
file_put_contents($TOTALdata, rtrim($data1).PHP_EOL.$data2);
I have a working system on which I get the data of two .csv file. And save all the data into array and then compare some of the data existing on both csv file. The system works well but later I found out that some of the rows doesn't display on the array. I think I don't use the proper code in reading a csv file. I want to edit/improve the system. This is my code on reading or getting the data from csv file.
$thedata = array();
$data = file("upload/payment.csv");
foreach ($data as $deposit){
$depositarray = explode(",", $deposit);
$depositlist = $depositarray;
$key = md5($depositlist[9] . $depositlist[10]);
$thedata[$key]['payment'] = array(
'name' => $depositlist[0],
'email' => $depositlist[1],
'modeofpayment' =>$depositlist[8],
'depositdate' => $depositlist[9],
'depositamount' => number_format($depositlist[10],2)
);
}
'<pre>',print_r($thedata),'</pre>';
//more code here for comaparing of datas...
1.) What is wrong with file("upload/payment.csv") when reading csv file?
2.) What is the best code in reading a csv file that is applicable on the
system, not changing the whole code. Should remain the foreach loop.
3.) Is fgetcsv much better for the existing code? What changes should be made?
Yes, You can use "fgetcsv" for this purpose.
The fgetcsv() function parses a line from an open file.This function returns the CSV fields in an array on success, or FALSE on failure and EOF.
check the examples given below
eg1 :
<?php
$file = fopen("contacts.csv","r");
print_r(fgetcsv($file));
fclose($file);
?>
eg 2:
<?php
$file = fopen("contacts.csv","r");
while(! feof($file))
{
print_r(fgetcsv($file));
}
fclose($file);
?>
Link : https://gist.github.com/jaywilliams/385876
I have a PHP script that takes a user-supplied string, then SSHs out to a remote server, reads the file into an array, then parses out the request/response blocks containing the string to return to the user.
This implementation does not work with large log files, because PHP runs out of memory trying to store the whole file in an array.
Example data:
*** REQUEST
request line 1
request line 2
request line 3
[...]
*** RESPONSE
response line 2
response line 2
response line 3
[...]
[blank line]
The length of the requests and responses vary, so I can never be sure how many lines there will be.
How can I read a file in chunks without storing the whole file in memory, while still ensuring I'll always be able to process a full request/response block of data from the log without truncating it?
I feel like I'm just being exceptionally dense about this, since my experience is usually working with whole files or arrays.
Here's my current code (with $search representing the user-supplied string we're looking for in the log), which is putting the whole file into an array first:
$stream = ssh2_exec($ssh, $command);
stream_set_blocking($stream, true);
$data = '';
while($buffer = fread($stream, 4096)) {
$data .= $buffer;
}
fclose($stream);
$rawlog = $data;
$logline = explode("\n",$rawlog);
reset($logline);
$block='';
foreach ( $logline as $k => $v ) {
if ( preg_match("/\*\*\* REQUEST",$v) && $block != '') {
if ( preg_match("/$search/i",$block) ) {
$results[] = $block;
}
$block=$v . "\n";
} else {
$block .= $v . "\n";
}
}
if ( preg_match("/$search/i",$block) ) {
$results[] = $block;
}
Any suggestions?
Hard to say if this would work for you but if the logs are in files you could use phpseclib's SFTP implementation (latest Git version).
eg.
If you do $sftp->get('filename.ext', false, 0, 1000) it'll download bytes 0-1000 from filename.ext and return a string with those bytes. If you do $sftp->get('filename.ext', false, 1000, 1000) it'll download bytes 1000-2000.
You can use command like tail which will get lines from 0 to 99, from 100 to 199, and so on.
This will require more ssh commands, but will not require you to store all result in memory.
Or, you can first store all the output into local file, and after that parse it.