PHP looping through huge text file is very slow, can you improve?

PHP looping through huge text file is very slow, can you improve? - php

The data contained in the text file (actually a .dat) looks like:
LIN*1234*UP*abcde*33*0*EA
LIN*5678*UP*fghij*33*0*EA
LIN*9101*UP*klmno*33*23*EA
There are actually over 500,000 such lines in the file.
This is what I'm using now:
//retrieve file once
$file = file_get_contents('/data.dat');
$file = explode('LIN', $file);
...some code
foreach ($list as $item) { //an array containing 10 items
foreach($file as $line) { //checking if these items are on huge list
$info = explode('*', $line);
if ($line[3] == $item[0]) {
...do stuff...
break; //stop checking if found
}
}
}
The problem is it runs way too slow - about 1.5 seconds of each iteration. I separately confirmed that it is not the '...do stuff...' that is impacting speed. Rather, its the search for the correct item.
How can I speed this up? Thank you.

If each item is on its own line, instead of loading the whole thing in memory, it might be better to use fgets() instead:
$f = fopen('text.txt', 'rt');
while (!feof($f)) {
$line = rtrim(fgets($f), "\r\n");
$info = explode('*', $line);
// etc.
}
fclose($f);
PHP file streams are buffered (~8kB), so it should be decent in terms of performance.
The other piece of logic can be rewritten like this (instead of iterating the file multiple times):
if (in_array($info[3], $items)) // look up $info[3] inside the array of 10 things
Or, if $items is suitably indexed:
if (isset($items[$info[3]])) { ... }

file_get_contents loads the whole file into memory as an array & then your code acts on it. Adapting this sample code from the official PHP fgets documentation should work better:
$handle = #fopen("test.txt", "r");
if ($handle) {
while (($buffer = fgets($handle, 4096)) !== false) {
$file_data = explode('LIN', $buffer);
foreach($file_data as $line) {
$info = explode('*', $line);
$info = array_filter($info);
if (!empty($info)) {
echo '<pre>';
print_r($info);
echo '</pre>';
}
}
}
if (!feof($handle)) {
echo "Error: unexpected fgets() fail\n";
}
fclose($handle);
}
The output of the above code using your data is:
Array
(
[1] => 1234
[2] => UP
[3] => abcde
[4] => 33
[6] => EA
)
Array
(
[1] => 5678
[2] => UP
[3] => fghij
[4] => 33
[6] => EA
)
Array
(
[1] => 9101
[2] => UP
[3] => klmno
[4] => 33
[5] => 23
[6] => EA
)
But still unclear about your missing code since the line that states:
foreach ($list as $item) { //an array containing 10 items
That seems to be another real choke point.

When you do file_get_contents, it loads the stuff into the memory so you can only imagine how resource intensive the process may be. Not to mention you have a nested loop, that's (O)n^2
You can either split the file if possible or use fopen, fgets and fclose to read them line by line.
If I was you, I’d use another language like C++ or Go if I really need the speeds.

Related

Reading data from a text file using Strpos & Substr to remove commas between and a while loop to repeat for each row

I'll try and be as clear as I can with what my problem is here, I've been working on this one for a while now and can't seem to get my head around it. Basically, I'm trying to:
Read numbers from a text file & store them in a 2D array
Remove any commas in the text file and store the remaining data in a table format
Using strpos & substr to extract the data, leaving behind unwanted commas
Then using a while loop to repeat this process so every line in the text file is read one at a time until all the lines are read.
At first my code was stating what lines I had errors in and I have since amended but now the php page doesnt seem to load at all. Is there some sort of error within my while loop statement?
Here is the php code I'm currently working with that doesnt seem to be loading:
$fileopen = fopen($file,'r') or die("Sorry, cannot find the file $file");
if($fileopen){
while (($buffer=fgets($fileopen,4096))!==false){
}
if(!feof($fileopen)){
echo "Error:unexpected fgets() fail\n";
}
fclose($fileopen);
}
$filearray = array();
$rows = 0;
$columns = 0;
$fileopen = fopen($file,'r') or die("Sorry, cannot open $file");
while(!feof($fileopen))
{
$line = fgets($fileopen);
$length = strlen($line);
$pos = 0;
$comma = 0;
while($pos < length) {
$comma = strpos($line,",",$comma);
$filearray[$rows][$columns] = substr($line,$pos,$comma);
$pos = $comma +1;
$columns++;
}
$columns = 0;
$rows++;
}
This section is essentially displaying the extracted data from the text file in a table format:
function array_transpose($filearray)
{
array_unshift($filearray, NULL);
return call_user_func_array('array_map', $filearray);
}
echo"<h1></h1>";
echo "<table border = 0 >";
for($row=0; $row<$count; $row++){
print "<tr>";
for($col=0; $col<$count; $col++){
echo "<td>".$array[$row][$col]."</td>";
}
}
echo "</table>";

It was quite the challenge for me to get this one to work, but I managed to do it. I've put comments inside the code to explain what's happening. The reasons it didn't work for you (as I said in the comments) was because you were creating an infinite loop. The $pos integer was always smaller than your $length integer. So the while() loop would never break.
Another issue that you didn't encounter yet was the use of $comma as the length for substr(). Because strpos() returns you the actual position and not the position relative to the offset, this would cause problems. That's why you needed to save the previous position of the delimiter (comma) and substract that from the current position of the delimiter.
Anyway, here is my example code. It's giving you the result that you need. It's up to you to incorporate it into your own code.
<?php
// Initial variables
$result = array();
$key = 0;
// Open the file
$handle = fopen("numbers.txt", "r");
if ($handle) {
while (($line = fgets($handle)) !== false) {
// First we set the delimiter into a variable
$delimiter = ',';
// Some integers we're going to use for our loop
$pos = 0; // The current position
$comma = 0; // Position of the next comma
$innerkey = 0; // Key used for the 2D result array
$previous = 0; // Previous comma position
$loops = 0; // Number of loops
$nr_commas = substr_count($line, $delimiter); // Number of commas in a single line
while($loops <= $nr_commas) {
// Get the position of the next comma
$comma = strpos($line,$delimiter,$comma);
// Make sure a comma is found
if($comma !== false){
// Put the substring into the result array using $pos as the offset
// and calculating the length by substracting the position of the previous
// comma from the current comma.
$result[$key][$innerkey] = substr($line,$pos,$comma - $previous);
// Add 1 to the previous comma or it will include the current comma in the result
$previous = $comma + 1;
$pos = $comma + 1;
$innerkey++;
$loops++;
$comma++;
} else {
// In case no more commas are found, we still need to add the last integer
$loops++;
$result[$key][$innerkey] = substr($line,strrpos($line,$delimiter)+1);
}
}
$key++;
}
fclose($handle);
} else {
echo "Unable to open the file";
}
echo "<pre>";
print_r($result);
echo "</pre>";
?>
TXT File used:
3,34,2,35,4,234,34,2,53,4
5,4,23,6,67,324,5,34,5
345,67,3,45,6,7
Result:
Array
(
[0] => Array
(
[0] => 3
[1] => 34
[2] => 2
[3] => 35
[4] => 4
[5] => 234
[6] => 34
[7] => 2
[8] => 53
[9] => 4
)
[1] => Array
(
[0] => 5
[1] => 4
[2] => 23
[3] => 6
[4] => 67
[5] => 324
[6] => 5
[7] => 34
[8] => 5
)
[2] => Array
(
[0] => 345
[1] => 67
[2] => 3
[3] => 45
[4] => 6
[5] => 7
)
)

concatenate words from two separate files

I am trying to concatenate words from a file with words from another file. However when I run the script I get a full output of the first file, then the output of the second file, then I see that the execution does not complete so I am stuck in an infinite loop. This is my code:
include 'passgen.txt';
include 'mycharset.txt';
$lines=file('passgen.txt');
$additions=file('mycharset.txt');
foreach($lines as $line){
foreach($additions as $addition){
$newPasswords=$line . $addition;
}
}
file_put_contents('newPasswords.txt', print_r($newPasswords, true));
passgen.txt content example:
stack
5tack
St4ck
...
mycharset.txt content example:
1
1!
2
2!
Expected results of what I am trying to achieve:
stack1
stack1!
stack2
stack2!
5tack1
5tack1!
...
EDIT:
adding full code from Jay answer:
#!/usr/bin/php
<?php
include 'passgen.txt';
include 'mycharset.txt';
$lines=file('passgen.txt');
$additions=file('mycharset.txt');
foreach($lines as $start) {
foreach($additions as $end) {
file_put_contents('newPasswords2.txt', $start.$end ."\r\n", FILE_APPEND);
}
}
?>
SAMPLE OUTPUT from Jay answer:
St4ck
6!3
St4ck
6!4
St4ck
6!5
I tried to remove the \r\n but still does not append the 6!5 to the word in the desired format below:
St4ck6!4
St4ck6!5
...

You are just creating a line, so you will not get an array as $newPasswords is overwritten on each iteration. What I did was place the concatenated words into an array ($word_array). You can then loop through the array easily and place into a text file:
EDIT
Added the trim() function to account for any whitespace characters in the text files we may not be aware of:
$file1 = ['stack','5tack','St4ck'];
$file2 = ['1','1!','2'];
$word_array = array();
foreach($file1 as $start) {
foreach($file2 as $end) {
$word_array[] = trim($start).trim($end);
}
}
print_r($word_array);
Returns:
Array
(
[0] => stack1
[1] => stack1!
[2] => stack2
[3] => 5tack1
[4] => 5tack1!
[5] => 5tack2
[6] => St4ck1
[7] => St4ck1!
[8] => St4ck2
)
Now you can put these in your text file like this:
foreach($word_array as $word) {
file_put_contents('newPasswords.txt', $word."\r\n");
}
Having said that I caution you against using this for password generation for any reason. You're essentially creating a rainbow table based on your comment:
I am using a weak password finder and I need a custom list of password to compare if hashes are weak.
You'd be better off providing the users with a password strength indicator that would encourage them to create strong passwords.
Shortening the process...
You could shorten the process entirely by writing to the file during the loop, which would require no arrays:
foreach($file1 as $start) {
foreach($file2 as $end) {
file_put_contents('newPasswords.txt', trim($start).trim($end) ."\r\n", FILE_APPEND);
}
}

You can use the fgets() function to read from a file line by line. Then you can concatenate that.
So something like:
$count = 0;
while($word = fgets($lines, 4096)){
$word = $word . additions[$count];
echo $word;
$count++;
}

import csv in array's, explode text file by datevalue given or by numrows possible?

I have a csv file which Contains Trace Signals of Vibration Data.
Its Starts with an ISO 8601 Date 2017-01-31T16:16:21.000+01:00
then it have 1024 rows of data(512Hz 2Sec Signal). And then the next Trace signal which starts with the new Date but in the same file -.-.
2017-01-31T16:16:21.000+01:00
0,06;0,03;0,01
0,07;0,03;0,01
0,07;0,03;0,02
.... up to line 1025
2017-01-31T16:24:37.000+01:00
1,72;0,2;-0,9
1,48;0,39;-1,46
1,23;0,58;-1,67
0,99;0,76;-1,81
... up to line 2050
This file can contain much more than 2 traces, how can i pass this in seperated arrays ? i would prefer arrays like :
Array
(
[0] => Array
(
[Time] => 2017-01-31T16:16:21.000
[Data] => array ( [0] => 0,06;0,03;0,01
[1] => 0,07;0,03;0,01 etc..)
)
But I don't know how to loop through the file and explode by the datetime value and also use it. Other way was To read Firstline as Time and next 1024 rows by line and push it but how ?

You may run into problems as your array gets larger, but this is the approach:
$i = $j = 0;
if($handle = fopen('/path/to/file.csv', 'r')) {
while(($line = fgets($handle)) !== false) {
if($i % 1025 === 0) {
$j++;
$result[$j]['Time'] = $line;
} else {
$result[$j]['Data'][] = $line;
}
$i++;
}
}
Just loop through the file and test if it is the first line or one that is a multiple of 1026. If it is, then you are on a line with the time, if not it is the data.

Parsing two files, and comparing strings

So I have two files, formatted like this:
First file
adam 20 male
ben 21 male
Second file
adam blonde
adam white
ben blonde
What I would like to do, is use the instance of adam in the first file, and search for it in the second file and print out the attributes.
Data is seperated by tab "\t", so this is what I have so far.
$firstFile = fopen("file1", "rb"); //opens first file
$i=0;
$k=0;
while (!feof($firstFile) ) { //feof = while not end of file
$firstFileRow = fgets($firstFile); //fgets gets line
$parts = explode("\t", $firstFileRow); //splits line into 3 strings using tab delimiter
$secondFile= fopen("file2", "rb");
$countRow = count($secondFile); //count rows in second file
while ($i<= $countRow){ //while the file still has rows to search
$row = fgets($firstFile); //gets whole row
$parts2 = explode("\t", $row);
if ($parts[0] ==$parts2[0]){
print $parts[0]. " has " . $parts2[1]. "<br>" ; //prints out the 3 parts
$i++;
}
}
}
I cant figure out how to loop through the second file, get each row, and compare to the first file.

You have a typo in the inner loop, you are reading firstfile and should be reading second file. In addition, after exiting inner loop you would want to re-wind the secondfile pointer back to the beginning.

How about this:
function file2array($filename) {
$file = file($filename);
$result = array();
foreach ($file as $line) {
$attributes = explode("\t", $line);
foreach (array_slice($attributes, 1) as $attribute)
$result[$attributes[0]][] = $attribute;
}
return $result;
}
$a1 = file2array("file1");
$a2 = file2array("file2");
print_r(array_merge_recursive($a1, $a2));
It will ouput the following:
Array (
[adam] => Array (
[0] => 20
[1] => male
[2] => blonde
[3] => white
)
[ben] => Array (
[0] => 21
[1] => male
[2] => blonde
)
)
However this one reads both files in one piece and will crash, if they are large ( >100MB). On the other hand 90% of all php programs have this problem, since file() is popular :-)

php function to split an array at each blank line?

I'm building a script which will open a saved text file, export the contents to an array and then dump the contents in a database. So far I've been able to get the file upload working quite happily and can also open said file.
The trouble I'm having is the contents of the file are variable, they have a fixed structure but the contents will change every time. The structure of the file is that each "section" is seperated by a blank line.
I've used php's file() to get an array ... I'm not sure if there's a way to then split that array up every time it comes across a blank line?
$file = $target_path;
$data = file($file) or die('Could not read file!');
Example output:
[0] => domain.com
[1] => # Files to be checked
[2] => /www/06.php
[3] => /www/08.php
[4] =>
[5] => domain2.com
[6] => # Files to be checked
[7] => /cgi-bin/cache.txt
[8] => /cgi-bin/log.txt
[9] =>
[10] => domain3.com
[11] => # Files to be checked
[12] => /www/Content.js
[13] =>
I know that Field 0 and 1 will be constants, they will always be a domain name then that hash line. The lines thereafter could be anywhere between 1 line and 1000 lines.
I've looked at array_chunk() which is close to what I want but it works on a numerical value, what would be good if there was something which would work on a specified value (like a new line, or a comma or something of that sort!).
Lastly, apologies if this has been answered previously. I've searched the usual places a few times for potential solutions.
Hope you can help :)
Foxed

I think what you're looking for is preg_split. If you just split on a carriage return, you might miss lines that just have spaces or tabs.
$output = array(...);//what you just posted
$string_output = implode('', $output);
$array_with_only_populated_lines = preg_split('`\n\W+`', $string_output);

You could just do something like this. You could change it also to read the file in line-by-line rather than using file(), which would use less memory, which might be important if you use larger files.
$handle = fopen('blah', 'r');
$blocks = array();
$currentBlock = array();
while (!feof($handle)) {
$line = fgets($handle);
if (trim($line) == '') {
if ($currentBlock) {
$blocks[] = $currentBlock;
$currentBlock = array();
}
} else {
$currentBlock[] = $line;
}
}
fclose($handle);
//if is anything left
if ($currentBlock) {
$blocks[] = $currentBlock;
}
print_r($blocks);

Have you tried split('\n\n', $file);
?

You could do it by splitting first on the blank line and then on new lines, e.g.:
$file = $target_path;
$fileData = file_get_contents($file) or die('Could not read file!');
$parts = explode("\n\n", $data);
$data = array();
foreach ($parts as $part) {
$data[] = explode("\n", $part);
}
You could also use preg_split() in place of the first explode() with a regex to sp.lit on lines containing just whitespace (e.g. \s+)

I would use the function preg_grep() to reduce the resulting array:
$array = preg_grep('/[^\s]/', $array);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP looping through huge text file is very slow, can you improve? - php

Related

Reading data from a text file using Strpos & Substr to remove commas between and a while loop to repeat for each row

concatenate words from two separate files

import csv in array's, explode text file by datevalue given or by numrows possible?

Parsing two files, and comparing strings

php function to split an array at each blank line?

Categories

Resources