PHP Array Processing Ability Decreases - php

I need help processing files holding about 46k lines or more than 30MB of data.
My original idea was to open the file and turn each line into an array element. This worked the first time as the array held about 32k values total.
The second time, the process was repeated, the array only held 1011 elements, and finally, the third time it could only hold 100.
I'm confused and don't know much about the backend array processes. Can someone explain what is happening and fix the code?
function file_to_array($cvsFile){
$handle = fopen($cvsFile, "r");
$path = fread($handle, filesize($cvsFile));
fclose($handle);
//Turn the file into an array and separate lines to elements
$csv = explode(",", $path);
//Remove common double spaces
foreach ($csv as $key => $line){
$csv[$key] = str_replace(' ', '', str_getcsv($line));
}
array_filter($csv);
//get the row count for the file and array
$rows = count($csv);
$filerows = count(file($cvsFile)); //this no longer works
echo "File has $filerows and array has $rows";
return $csv;
}

The approach here can be split in 2.
Optimized file reading and processing
Proper storage solution
Optimized file processing can be done like so:
$handle = fopen($cvsFile, "r");
$rowsSucceed = 0;
$rowsFailed = 0;
if ($handle) {
while (($line = fgets($handle)) !== false) { // Reading file by line
// Process CSV line and check if it was parsed correctly
// And count as you go
if (!empty($parsedLine)) {
$csv[$key] = ... ;
$rowsSucceed++;
} else {
$rowsFailed++;
}
}
fclose($handle);
} else {
// Error handling
}
$totalLines = $rowsSucceed + $rowsFailed;
Also you can avoid array_filter() simply by not adding processed line if its empty.
It will allow to optimize memory usage during script execution.
Proper storage
Proper storage here is needed for performing operations on certain amount of data. File reading are ineffective and expensive. Using simple file based database like sqlite can help you a lot and increase overall performance of your script.
For this purpose you probably should process your CSV directly to database and than perform count operation on parsed data avoiding excessive file line counts etc.
Also it gives you further advantage on working with data not keeping it all in memory.

Your question says you want to "turn each line into an array element" but that is definitely not what you are doing. The code is quite clear; it reads the entire file into $path and then uses explode() to make one massive flat array of every element on every line. Then later you're trying to run str_getcsv() on each item, which of course isn't going to work; you've already exploded all the commas away.
Looping over the file using fgetcsv() makes more sense:
function file_to_array($cvsFile) {
$filerows = 0;
$handle = fopen($cvsFile, "r");
while ($line = fgetcsv($handle)) {
$filerows++;
// skip empty lines
if ($line[0] === null) {
continue;
}
//Remove common double spaces
$csv[] = str_replace(' ', '', $line);
}
//get the row count for the file and array
$rows = count($csv);
echo "File has $filerows and array has $rows";
fclose($handle);
return $csv;
}

Related

PHP Array sorting within WHILE loop

I have a huge issue, I cant find any way to sort array entries. My code:
<?php
error_reporting(0);
$lines=array();
$fp=fopen('file.txt, 'r');
$i=0;
while (!feof($fp))
{
$line=fgets($fp);
$line=trim($line);
$lines[]=$line;
$oneline = explode("|", $line);
if($i>30){
$fz=fopen('users.txt', 'r');
while (!feof($fz))
{
$linez=fgets($fz);
$linez=trim($linez);
$lineza[]=$linez;
$onematch = explode(",", $linez);
if (strpos($oneline[1], $onematch[1])){
echo $onematch[0],$oneline[4],'<br>';
}
else{
}
rewind($onematch);
}
}
$i++;
}
fclose($fp);
?>
The thing is, I want to sort items that are being echo'ed by $oneline[4]. I tried several other posts from stackoverflow - But was not been able to find a solution.
The anser to your question is that in order to sort $oneline[4], which seems to contain a string value, you need to apply the following steps:
split the string into an array ($oneline[4] = explode(',',
$oneline[4]))
sort the resulting array (sort($oneline[4]))
combine the array into a string ($oneline[4] = implode(',',
$oneline[4]))
As I got the impression variable naming is low on the list of priorities I'm re-using the $oneline[4] variable. Mostly to clarify which part of the code I am referring to.
That being said, there are other improvements you should be making, if you want to be on speaking terms with your future self (in case you need to work on this code in a couple of months)
Choose a single coding style and stick to it, the original code looked like it was copy/pasted from at least 4 different sources (mostly inconsistent quote-marks and curly braces)
Try to limit repeating costly operations, such as opening files whenever you can (to be fair, the agents.data could contain 31 lines and the users.txt would be opened only once resulting in me looking like a fool)
I have updated your code sample to try to show what I mean by the points above.
<?php
error_reporting(0);
$lines = array();
$users = false;
$fp = fopen('http://20.19.202.221/exports/agents.data', 'r');
while ($fp && !feof($fp)) {
$line = trim(fgets($fp));
$lines[] = $line;
$oneline = explode('|', $line);
// if we have $users (starts as false, is turned into an array
// inside this if-block) or if we have collected 30 or more
// lines (this condition is only checked while $users = false)
if ($users || count($lines) > 30) {
// your code sample implies the users.txt to be small enough
// to process several times consider using some form of
// caching like this
if (!$users) {
// always initialize what you intend to use
$users = [];
$fz = fopen('users.txt', 'r');
while ($fz && !feof($fz)) {
$users[] = explode(',', trim(fgets($fz)));
}
// always close whatever you open.
fclose($fz);
}
// walk through $users, which contains the exploded contents
// of each line in users.txt
foreach ($users as $onematch) {
if (strpos($oneline[1], $onematch[1])) {
// now, the actual question: how to sort $oneline[4]
// as the requested example was not available at the
// time of writing, I assume
// it to be a string like: 'b,d,c,a'
// first, explode it into an array
$oneline[4] = explode(',', $oneline[4]);
// now sort it using the sort function of your liking
sort($oneline[4]);
// and implode the sorted array back into a string
$oneline[4] = implode(',', $oneline[4]);
echo $onematch[0], $oneline[4], '<br>';
}
}
}
}
fclose($fp);
I hope this doesn't offend you too much, just trying to help and not just providing the solution to the question at hand.

How do I get only unique values from CSV file array

I am building a small application that does some simple reporting based on CSV files, the CSV files are in the following format:
DATE+TIME,CLIENTNAME1,HAS REQUEST BLABLA1,UNIQUE ID
DATE+TIME,CLIENTNAME2,HAS REQUEST BLABLA2,UNIQUE ID
DATE+TIME,CLIENTNAME1,HAS REQUEST BLABLA1,UNIQUE ID
DATE+TIME,CLIENTNAME2,HAS REQUEST BLABLA2,UNIQUE ID
Now I am processing this using the following function:
function GetClientNames(){
$file = "backend/AllAlarms.csv";
$lines = file($file);
arsort($lines);
foreach ($lines as $line_num => $line) {
$line_as_array = explode(",", $line);
echo '<li><i class="icon-pencil"></i>' . $line_as_array[1] . '</li>';
}
}
I am trying to retrieve only the Clientname values, but I only want the unique values.
I have tried to create several different manners of approaching this, I understand I need to use the unique_array function, but I have no clue on exactly how to use this function.
I've tried this:
function GetClientNames(){
$file = "backend/AllAlarms.csv";
$lines = file($file);
arsort($lines);
foreach ($lines as $line_num => $line) {
$line_as_array = explode(",", $line);
$line_as_array[1] = unique_array($line_as_array[1]);
echo '<li><i class="icon-pencil"></i>' . $line_as_array[1] . '</li>';
}
}
But this gives me a very very dirty result with 100's of spaces instead of the correct data.
I would recommend you to use the fgetcsv() function when reading in csv files. In the wild csv files can be quite complicated handle by naive explode() approach:
// this array will hold the results
$unique_ids = array();
// open the csv file for reading
$fd = fopen('t.csv', 'r');
// read the rows of the csv file, every row returned as an array
while ($row = fgetcsv($fd)) {
// change the 3 to the column you want
// using the keys of arrays to make final values unique since php
// arrays cant contain duplicate keys
$unique_ids[$row[3]] = true;
}
var_dump(array_keys($unique_ids));
You can also collect values and use array_unique() on them later. You probably want to split the "reading in" and the "writing out" part of your code too.
Try using array_unique()
Docs:
http://php.net/manual/en/function.array-unique.php

Reading multiple columns from large CSV files in PHP

I need to read two columns from a large CSV file. The CSV has multiple columns and can sometimes have following properties:
~25,000 lines
Contain spaces and blank rows
Be uneven (some columns longer than others)
In the example CSV file above, I would be only interested in the codes in the "Buy" and "Sell" columns (columns A and D).
I have written the following code (warning: it's not very elegant) to iterate over all rows and read only the columns I require. I create strings as inputs for 1 large MYSQL query (as opposed to running many small queries).
<?php
//Increase the allowed execution time
set_time_limit(0);
ini_set('memory_limit','256M');
ini_set('max_execution_time', 0);
//Set to detect the ending of CSV files
ini_set('auto_detect_line_endings', true);
$file = "test.csv";
$buy = $sold = ""; //Initialize empty strings
if (($handle = #fopen($file, "r")) !== FALSE) {
while (($pieces = fgetcsv($handle, 100, ",")) !== FALSE) {
if ( ! empty($pieces[0]) ) {
$buy .= $pieces[0] ." ";
}
if ( ! empty($pieces[3]) ) {
$sold .= $pieces[3] ." ";
}
}
echo "Buy ". $buy ."<br>"; //Do something with strings...
echo "Sold ". $sold ."<br>";
//Close the file
fclose($handle);
}
?>
My question is: is this the best way to perform such a task? The code works for smaller test files, but are there short comings I've overlooked in iterating over the CSV file like this?
First, reading any large files is memory consuming if you store them in variables. You may check out reading large files(more than 4GB in unix)
Secondly, you can output the $buy & $sold
on the while loop which might be more memory efficient in the way that those two variables are not saved on the memory.
Lastly, Use file seek method in php fseek documentation

Split large Excel/Csv file to multiple files on PHP or Javascript

I have excel(file.xls)/csv(file.csv) file that contains/will contain hundreds of thousands of entry, even millions I guess. Is it possible to split this one to multiple file? Like file.xls to file1.xls, file2.xls, file3.xls and so on.
Are there any libraries to use? Is this possible on PHP? or how about javascript?
On where I can specify how many rows to be included on each file?
Thanks
Quick and dirty way of splitting a CSV file into several CSV files
$inputFile = 'input.csv';
$outputFile = 'output';
$splitSize = 10000;
$in = fopen($inputFile, 'r');
$rowCount = 0;
$fileCount = 1;
while (!feof($in)) {
if (($rowCount % $splitSize) == 0) {
if ($rowCount > 0) {
fclose($out);
}
$out = fopen($outputFile . $fileCount++ . '.csv', 'w');
}
$data = fgetcsv($in);
if ($data)
fputcsv($out, $data);
$rowCount++;
}
fclose($out);
Yes it is possible to do that in PHP and with CSV files. You basically iterate over the large file and chunk each X rows, forwarding those rows to another file.
You find the information how to open the large CSV file as an iterator in this answer here:
Answer to "how to extract data from csv file in php"
Then you need to chunk the iterator each X rows parts. That can be done as outline here:
Answer to "Need some advice with PHP loop"
Just instead of outputting into multiple <ul>...</ul> HTML lists, you copy over into a new files. That basically works like outlined in:
Answer to "How can I split a CSV file in PHP?"
However this time you want to use the SplFileObject::fputcsv method. Take care you use the latest stable PHP for this, otherwise you need do different, see fputcsv().
If the first line of the original file contains column-headers, you might be as well interested in the following:
Answer to "Process CSV Into Array With Column Headings For Key"
It just shows some ways to extend / process the incomming file. You might not need the full abstraction done there, just keeping the first line around might do it already.
I think You can also use "split by file size":
$part = 1;
$maxSize = 50;//50 Mb
$fopen = fopen('filename.csv','r') or die ('ERROR');
while (($line = fgetcsv($fopen, 10000, ";")) !== FALSE) {
$ftowrite = fopen("Part_$part.csv",'a');
fputcsv($ftowrite,$line);
clearstatcache();
$size = filesize ( "review_p$part.csv" ) / 1000000;
if ($size > $maxSize) {
fclose($ftowrite);
$part++;
}
}

Using fseek to start reading a CSV after a certain number of lines

I am using the current code to read a csv file and add it to an array:
echo "starting CSV import<br>";
$current_row = 1;
$handle = fopen($csv, "r");
while ( ($data = fgetcsv($handle, 10000, ",") ) !== FALSE )
{
$number_of_fields = count($data);
if ($current_row == 1) {
//Header line
for ($c=0; $c < $number_of_fields; $c++)
{
$header_array[$c] = $data[$c];
}
} else {
//Data line
for ($c=0; $c < $number_of_fields; $c++)
{
$data_array[$header_array[$c]] = $data[$c];
}
array_push($products, $data_array);
}
$current_row++;
}
fclose($handle);
echo "finished CSV import <br>";
However when using a very large CSV this times out on the server, or has a memory limit error.
I'd like a way to do it in stages, so after the first say 100 lines it will refresh the page, starting at line 101.
I will probably be doing this with a meta refresh and a URL parameter.
I just need to know how to adapt that code above to start at the line I tell it to.
I have looked into fseek() but I'm not sure how to implement this here.
Can you please help?
The timout can be circumvented using
ignore_user_abort(true);
set_time_limit(0);
When experiencing problems with the memory limit, it may be wise to take a step back and look at what you're actually doing with the data you're processing. Are you pushing the data into a database? calculate something off the data but don't need to store the actual data, …
Do you really need to push (array_push($products, $data_array);) the rows into an array (for later processing)? can you instead write to the database directly? or calculate directly? or build an html <table> directly? or whatever the hell you're doing right then an there, within the while() loop, without pushing everything into an array first?
If you're able to chunk the processing, I guess you don't need that array at all. Otherwise you'd have to restore the array for every chunk - not solving the memory issue one bit.
If you can manage to change your processing algorithm to waste less memory / time, you should seriously consider that over any chunked processing requiring a round-trip to the browser (for so many performance and security reasons…).
Anyways, you can, at any time, identify the current stream offset with ftell() and re-set to that position using fseek(). You'd only need to pass that integer to your next iteration.
Also there is no need for your inner for() loops. This should produce the same results:
<?php
$products = array();
$cols = null;
$first = true;
$handle = fopen($csv, "r");
while (($data = fgetcsv($handle, 10000, ",")) !== false) {
if ($first) {
$cols = $data;
$first = false;
} else {
$products[] = array_combine($cols, $data);
}
}
fclose($handle);
echo "finished CSV import <br>";

Categories