How to avoid storing 1 million element in an array PHP - php

I'm parsing a 1 000 000 line csv file in PHP to recover this datas: IP Address, DNS , Cipher suites used.
In order to know if some DNS (having several mail servers) has different Cipher suites used on their servers, I have to store in a array a object containing the DNS name, a list of the IP Address of his servers, and a list of cipher suites he uses. At the end I have an array of 1 000 000 elements. To know the number of DNS having different cipher suites config on their servers I do:
foreach($this->allDNS as $dnsObject){
$res=0;
if(count($dnsObject->getCiphers()) > 1){ //if it has several different config
res++;
}
return $res;
}
Problem: Consumes too much memory, i can't run my code on 1000000 line csv (if I don't store these data in a array, I parse this csv file in 20 sec...). Is there a way to bypass this problem ?
NB: I already put
ini_set('memory_limit', '-1');
but this line just bypass the memory error.

Saving all of those CSV data will definitely take its toll on the memory.
One logical solution to your problem is to have a database that will store all of those data.
You may refer to this link for a tutorial on parsing your CSV file and storing it to database.

Write the processed Data (for each Line seperately) into one File (or Database)
file_put_contents('data.txt', $parsingresult, FILE_APPEND);
FILE_APPEND will append the $parsingresult at the End of the File-Content.
Then you can access the processed Data by file_get_contents() or file().
Anyways. I think, using a Database and some Pre-Processing would be the best Solution if this is needed more often.

You can use fgetcsv() to read and parse the CSV file one line at a time. Keep the data you need and discard the line:
// Store the useful data here
$data = array();
// Open the CSV file
$fh = fopen('data.csv', 'r');
// The first line probably contains the column names
$header = fgetcsv($fh);
// Read and parse one data line at a time
while ($row = fgetcsv($fh)) {
// Get the desired columns from $row
// Use $header if the order or number of columns is not known in advance
// Store the gathered info into $data
}
// Close the CSV file
fclose($fh);
This way it uses the minimum amount of memory needed to parse the CSV file.

Related

binary safe write on file with php to create a DBF file

I need to split a big DBF file using php functions, this means that i have for example 1000 records, i have to create 2 files with 500 records each.
I do not have any dbase extension available nor i can install it so i have to work with basic php functions. Using basic fread function i'm able to correctly read and parse the file, but when i try to write a new dbf i have some problems.
As i have understood, the DBF file is structured in a 2 line file: the first line contains file info, header info and it's in binary. The second line contains the data and it's plain text. So i thought to simply write a new binary file replicating the first line and manually adding the first records in the first file, the other records in the other file.
That's the code i use to parse the file and it works nicely
$fdbf = fopen($_FILES['userfile']['tmp_name'],'r');
$fields = array();
$buf = fread($fdbf,32);
$header=unpack( "VRecordCount/vFirstRecord/vRecordLength", substr($buf,4,8));
$goon = true;
$unpackString='';
while ($goon && !feof($fdbf)) { // read fields:
$buf = fread($fdbf,32);
if (substr($buf,0,1)==chr(13)) {$goon=false;} // end of field list
else {
$field=unpack( "a11fieldname/A1fieldtype/Voffset/Cfieldlen/Cfielddec", substr($buf,0,18));
$unpackString.="A$field[fieldlen]$field[fieldname]/";
array_push($fields, $field);
}
}
fseek($fdbf, 0);
$first_line = fread($fdbf, $header['FirstRecord']+1);
fseek($fdbf, $header['FirstRecord']+1); // move back to the start of the first record (after the field definitions)
first_line is the variable the contains the header data, but when i try to write it in a new file something wrong happens and the row isn't written exactly as it was read. That's the code i use for writing:
$handle_log = fopen($new_filename, "wb");
fwrite($handle_log, $first_line, strlen($first_line) );
fwrite($handle_log, $string );
fclose($handle_log);
I've tried to add the b value to fopen mode parameter as suggested to open it in a binary way, i've also taken a suggestion to add exactly the length of the string to avoid the stripes of some characters but unsuccessfully since all the files written are not correctly in DBF format. What can i do to achieve my goal?
As i have understood, the DBF file is structured in a 2 line file: the
first line contains file info, header info and it's in binary. The
second line contains the data and it's plain text.
Well, it's a bit more complicated than that.
See here for a full description of the dbf file format.
So it would be best if you could use a library to read and write the dbf files.
If you really need to do this yourself, here are the most important parts:
Dbf is a binary file format, so you have to read and write it as binary. For example the number of records is stored in a 32 bit integer, which can contain zero bytes.
You can't use string functions on that binary data. For example strlen() will scan the data up to the first null byte, which is present in that 32 bit integer, and will return the wrong value.
If you split the file (the records), you'll have to adjust the record count in the header.
When splitting the records keep in mind that each record is preceded by an extra byte, a space 0x20 if the record is not deleted, an asterisk 0x2A if the record is deleted. (for example, if you have 4 fields of 10 bytes, the length of each record will be 41) - that value is also available in the header: bytes 10-11 - 16-bit number - Number of bytes in the record. (Least significant byte first)
The file could end with the end-of-file marker 0x1A, so you'll have to check for that as well.

How to extract data from CSV with PHP

I'm using the Sebastian Bergmann PHPUnit selenium webdriver.
Current I have:
$csv = file_get_contents('functions.csv', NULL,NULL,1);
var_dump($csv);
// select 1 random line here
This will load my CSV file and give me all possible data from the file.
It has multiple rows for example:
Xenoloog-FUNC/8_4980
Xylofonist-FUNC/8_4981
IJscoman-FUNC/8_4982
Question: How can I get that data randomly?
I just want to use 1 ( random) line of data.
Would it be easier to just grab 1 (random) line of the file instead of everything?
Split the string into an array, then grab a random index from that array:
$lines = explode("\n", $csv);
$item = $lines[array_rand($lines)];
You could use the offset and maxlen parameters to grab part of the file using file_get_contents. You could also use fseek after fopen to select part of a file. These functions both take numbers of bytes as arguments. This post has a little more information:
Get part of a file by byte number in php
It may require some hacking to translate a particular row-index of a CSV file to a byte offset. You might need to generate and load a small meta-data file which contains a list of bytes-occupancies for each row of of CSV data. That would probably help.

PHP string of IP addresses in a .txt file?

I'm writing a feature for an admin panel that blocks ip addresses on the apache level. The file is called blacklist.txt and looks like 10.0.0.1,10.0.0.2,10.0.0.3, ... All a single line, with each ip address separated by a comma. After reading What is the best way to write a large file to disk in PHP?, I am still unsure of the best practices on the matter.
Here's what I want to do: IF an administrator presses the 'ban hammer', the file is read looking for strpos($file, $ip), if it's not found, append to the end of the file and the .htaccess file blocks accordingly.
Question: is a .txt file suitable for this potentially large amount of data? I do not want to execute a query to check if someone is banned every time a page is requested
EDIT:
The purpose is to block single ip addresses that have 10 failed login attempts in the past 12 hours. I would think that the 'recover my password' would prevent a normal client from doing this.
Question: is a .txt file suitable for this potentially large amount of
data?
No, it is not. A database with proper indexing is.
First for reading your File in CSV format
you can use many ways. example:
$rows = array_map('str_getcsv', file('myfile.csv'));
$header = array_shift($rows);
$csv = array();
foreach ($rows as $row) {
$csv[] = array_combine($header, $row);
}
src: http://steindom.com/articles/shortest-php-code-convert-csv-associative-array
for checking that on each page load and to minimize the Reading of that file
you can use a memory cache , something like memCache, then search the array for the incoming ip. note: memory cache is faster then Database query.
PHP shared memory ref: http://www.php.net/manual/en/book.shmop.php
memCache php.net/memcache
Array Search php.net/in_array
also to return the key if value found php.net/array_search
note: in 1 mb file you can store ~65K IP considering an ip is the following format: "255.255.255.255,"
it's even better if you put the key of the array the ip, then instead of searching the array for that ip you can Check if the Key exist with this: php.net/array_key_exists‎

Reading csv file with large number of fields

I have csv file with 104 fields, but I need only 4 fields to use in mysql database. each file has about a million rows.
could somebody tell me efficient way to do this? reading each line to array takes long time.
thanks
You have to read every line in its entirety by definition. This is necessary to find the delimiter for the next record (i.e. the newline character). You only need to discard the data you have read that you don't need. E.g.:
$data = array();
$fh = fopen('data.csv', 'r');
$headers = fgetcsv($fh);
while ($row = fgetcsv($fh)) {
$row = array_combine($headers, $row);
$data[] = array_intersect_key($row, array_flip(array('foo', 'bar', 'baz')));
// alternatively, if you know the column index, something like:
// $data[] = array($row[1], $row[45], $row[60]);
}
This only retains the columns foo, bar and baz and discards the rest. The reading from file (fgetcsv) is about as fast as it gets. If you need it any faster, you'll have to implement your own CSV tokenizer and parser which skips over the columns you don't need without even temporarily storing them in memory; how much of a performance boost this brings vs. development time necessary to implement this bug free is very debatable.
simple excel macro can drop all unnecessary columns (100 out of 104)
within second. I am looking for similar solution.
That is because Excel, once a file is opened, has all data in memory and can act on it very quickly. For an accurate comparison you need to compare the time it takes to open the file in Excel + dropping of the columns, not just dropping the columns.

Read CSV from end to beginning in PHP

I am using PHP to expose vehicle GPS data from a CSV file. This data is captured at least every 30 seconds for over 70 vehicles and includes 19 columns of data. This produces several thousand rows of data and file sizes around 614kb. New data is appended to end of the file. I need to pull out the last row of data for each vehicle, which should represent the most the recent status. I am able to pull out one row for each unit, however since the CSV file is in chronological order I am pulling out the oldest data in the file instead of the newest. Is it possible to read the CSV from the end to the beginning? I have seen some solutions, however they typically involve loading the entire file into memory and then reversing it, this sounds very inefficient. Do I have any other options? Thank you for any advice you can offer.
EDIT: I am using this data to map real-time locations on-the-fly. The data is only provided to me in CSV format, so I think importing into a DB is out of the question.
With fseek you can set the pointer to the end of the file and offset it negative to read a file backwards.
If you must use csv files instead of a database, then perhaps you could read the file line-by-line. This will prevent more than the last line being stored in memory (thanks to the garbage collector).
$handle = #fopen("/path/to/yourfile.csv", "r");
if ($handle) {
while (($line = fgets($handle)) !== false) {
// old values of $last are garbage collected after re-assignment
$last = $line;
// you can perform optional computations on past data here if desired
}
if (!feof($handle)) {
echo "Error: unexpected fgets() fail\n";
}
fclose($handle);
// $last will now contain the last line of the file.
// You may now do whatever with it
}
edit: I did not see the fseek() post. If all you need is the last line, then that is the way to go.

Categories