Parsing two files, and comparing strings

Parsing two files, and comparing strings - php

So I have two files, formatted like this:
First file
adam 20 male
ben 21 male
Second file
adam blonde
adam white
ben blonde
What I would like to do, is use the instance of adam in the first file, and search for it in the second file and print out the attributes.
Data is seperated by tab "\t", so this is what I have so far.
$firstFile = fopen("file1", "rb"); //opens first file
$i=0;
$k=0;
while (!feof($firstFile) ) { //feof = while not end of file
$firstFileRow = fgets($firstFile); //fgets gets line
$parts = explode("\t", $firstFileRow); //splits line into 3 strings using tab delimiter
$secondFile= fopen("file2", "rb");
$countRow = count($secondFile); //count rows in second file
while ($i<= $countRow){ //while the file still has rows to search
$row = fgets($firstFile); //gets whole row
$parts2 = explode("\t", $row);
if ($parts[0] ==$parts2[0]){
print $parts[0]. " has " . $parts2[1]. "<br>" ; //prints out the 3 parts
$i++;
}
}
}
I cant figure out how to loop through the second file, get each row, and compare to the first file.

You have a typo in the inner loop, you are reading firstfile and should be reading second file. In addition, after exiting inner loop you would want to re-wind the secondfile pointer back to the beginning.

How about this:
function file2array($filename) {
$file = file($filename);
$result = array();
foreach ($file as $line) {
$attributes = explode("\t", $line);
foreach (array_slice($attributes, 1) as $attribute)
$result[$attributes[0]][] = $attribute;
}
return $result;
}
$a1 = file2array("file1");
$a2 = file2array("file2");
print_r(array_merge_recursive($a1, $a2));
It will ouput the following:
Array (
[adam] => Array (
[0] => 20
[1] => male
[2] => blonde
[3] => white
)
[ben] => Array (
[0] => 21
[1] => male
[2] => blonde
)
)
However this one reads both files in one piece and will crash, if they are large ( >100MB). On the other hand 90% of all php programs have this problem, since file() is popular :-)

Related

Reading data from a text file using Strpos & Substr to remove commas between and a while loop to repeat for each row

I'll try and be as clear as I can with what my problem is here, I've been working on this one for a while now and can't seem to get my head around it. Basically, I'm trying to:
Read numbers from a text file & store them in a 2D array
Remove any commas in the text file and store the remaining data in a table format
Using strpos & substr to extract the data, leaving behind unwanted commas
Then using a while loop to repeat this process so every line in the text file is read one at a time until all the lines are read.
At first my code was stating what lines I had errors in and I have since amended but now the php page doesnt seem to load at all. Is there some sort of error within my while loop statement?
Here is the php code I'm currently working with that doesnt seem to be loading:
$fileopen = fopen($file,'r') or die("Sorry, cannot find the file $file");
if($fileopen){
while (($buffer=fgets($fileopen,4096))!==false){
}
if(!feof($fileopen)){
echo "Error:unexpected fgets() fail\n";
}
fclose($fileopen);
}
$filearray = array();
$rows = 0;
$columns = 0;
$fileopen = fopen($file,'r') or die("Sorry, cannot open $file");
while(!feof($fileopen))
{
$line = fgets($fileopen);
$length = strlen($line);
$pos = 0;
$comma = 0;
while($pos < length) {
$comma = strpos($line,",",$comma);
$filearray[$rows][$columns] = substr($line,$pos,$comma);
$pos = $comma +1;
$columns++;
}
$columns = 0;
$rows++;
}
This section is essentially displaying the extracted data from the text file in a table format:
function array_transpose($filearray)
{
array_unshift($filearray, NULL);
return call_user_func_array('array_map', $filearray);
}
echo"<h1></h1>";
echo "<table border = 0 >";
for($row=0; $row<$count; $row++){
print "<tr>";
for($col=0; $col<$count; $col++){
echo "<td>".$array[$row][$col]."</td>";
}
}
echo "</table>";

It was quite the challenge for me to get this one to work, but I managed to do it. I've put comments inside the code to explain what's happening. The reasons it didn't work for you (as I said in the comments) was because you were creating an infinite loop. The $pos integer was always smaller than your $length integer. So the while() loop would never break.
Another issue that you didn't encounter yet was the use of $comma as the length for substr(). Because strpos() returns you the actual position and not the position relative to the offset, this would cause problems. That's why you needed to save the previous position of the delimiter (comma) and substract that from the current position of the delimiter.
Anyway, here is my example code. It's giving you the result that you need. It's up to you to incorporate it into your own code.
<?php
// Initial variables
$result = array();
$key = 0;
// Open the file
$handle = fopen("numbers.txt", "r");
if ($handle) {
while (($line = fgets($handle)) !== false) {
// First we set the delimiter into a variable
$delimiter = ',';
// Some integers we're going to use for our loop
$pos = 0; // The current position
$comma = 0; // Position of the next comma
$innerkey = 0; // Key used for the 2D result array
$previous = 0; // Previous comma position
$loops = 0; // Number of loops
$nr_commas = substr_count($line, $delimiter); // Number of commas in a single line
while($loops <= $nr_commas) {
// Get the position of the next comma
$comma = strpos($line,$delimiter,$comma);
// Make sure a comma is found
if($comma !== false){
// Put the substring into the result array using $pos as the offset
// and calculating the length by substracting the position of the previous
// comma from the current comma.
$result[$key][$innerkey] = substr($line,$pos,$comma - $previous);
// Add 1 to the previous comma or it will include the current comma in the result
$previous = $comma + 1;
$pos = $comma + 1;
$innerkey++;
$loops++;
$comma++;
} else {
// In case no more commas are found, we still need to add the last integer
$loops++;
$result[$key][$innerkey] = substr($line,strrpos($line,$delimiter)+1);
}
}
$key++;
}
fclose($handle);
} else {
echo "Unable to open the file";
}
echo "<pre>";
print_r($result);
echo "</pre>";
?>
TXT File used:
3,34,2,35,4,234,34,2,53,4
5,4,23,6,67,324,5,34,5
345,67,3,45,6,7
Result:
Array
(
[0] => Array
(
[0] => 3
[1] => 34
[2] => 2
[3] => 35
[4] => 4
[5] => 234
[6] => 34
[7] => 2
[8] => 53
[9] => 4
)
[1] => Array
(
[0] => 5
[1] => 4
[2] => 23
[3] => 6
[4] => 67
[5] => 324
[6] => 5
[7] => 34
[8] => 5
)
[2] => Array
(
[0] => 345
[1] => 67
[2] => 3
[3] => 45
[4] => 6
[5] => 7
)
)

Handle csv with php in for loop and write them back

i'm struggling with a php code that must open more than 1 csv, handle its content and write back on a different way that was initially.
So, i have all csv's on rows, and i wanna parse them and split them in columns on 2 rows.
The code looks like:
$currentDirOtherCSV = __DIR__ . "/uploads/" . $ftp_location . "partial_crawler_data/";
$files_other_CSV = scandir($partial_crawler_data, 1);
for($i = 0; $i < count($files_other_CSV) - 2; $i++){
$csvFileToOpen = file($currentDirOtherCSV . $files_other_CSV[$i]);
$screamingDataFirst = [];
foreach ($csvFileToOpen as $line) {
$screamingDataFirst[] = str_getcsv($line);
}
// remove header from csv
array_shift($screamingDataFirst);
array_shift($screamingDataFirst);
// handle array to export it on 2 rows
$theExportArray = [[],[]];
for($j = 0; $j < count($screamingDataFirst); $j++){
if(!array_key_exists('1', $screamingDataFirst[$j])) {
$screamingDataFirst[$j][1] = "";
}
}
foreach ($screamingDataFirst as $key => $row){
$theExportArray[0][$key] = $row[0];
$theExportArray[1][$key] = $row[1];
}
print_r($theExportArray);
// edit csv file remote
$new_csv_data = fopen($currentDirOtherCSV . $files_other_CSV[$i], "w");
foreach($theExportArray as $row){
fputcsv($new_csv_data, $row, ";");
}
fclose($new_csv_data);
}
The csv looks like:
"Page Titles - Duplicate"
"Address","Title 1"
"http://www.seloo.ro/index.php?route=product/product&path=60&product_id=123","Scaun tapitat Alb"
"http://www.seloo.ro/index.php?route=product/product&path=60&product_id=122","Scaun tapitat Alb"
"http://www.seloo.ro/index.php?route=product/product&path=60&product_id=121","Scaun tapitat Alb"
"http://www.seloo.ro/index.php?route=product/product&path=60&product_id=127","Scaun tapitat Alb"
so i get this array by parsing it:
Array
(
[0] => Array
(
[0] => http://www.seloo.ro/index.php?route=product/product&path=60&product_id=123
[1] => http://www.seloo.ro/index.php?route=product/product&path=60&product_id=122
[2] => http://www.seloo.ro/index.php?route=product/product&path=60&product_id=121
[3] => http://www.seloo.ro/index.php?route=product/product&path=60&product_id=127
)
[1] => Array
(
[0] => Scaun tapitat Alb
[1] => Scaun tapitat Alb
[2] => Scaun tapitat Alb
[3] => Scaun tapitat Alb
)
)
EDIT
That must be:
http://www.seloo.ro/index.php?route=product/product&product_id=95;http://www.seloo.ro/index.php?route=product/product&path=59&product_id=95;http://www.seloo.ro/index.php?route=product/product&product_id=94;
"Masa New vision";"Masa flori lila";"Masa flori lila";
Problem:
I thought that if i open the file, handle it, push it back modified in the csv and close (fclose) then do it again untill no csv, will handle them 1 by one...
But it only write in a single csv, rest of them arent touched
Am i missing something?
UPDATE
The script works fine.
The problem was that i have tried to update the unuploaded files on the server.
Script was faster than upload.
Thank you all and sorry, i should check that earlier

Counting blank spaces till next string in PHP array

I have imported .xlsx file to PHP through a script. I only need two columns from the file
This is done, but as you can see there is address and following it blank spaces.
I need the information from right column to be in one string corresponding to the address on the left.
foreach ($Reader as $Row)
{
array_push($data, $Row);
$aadress_loc = array_search("Aadress", $Row);
$eluruumid = array_search("Ehitise osad", $Row);
array_push($asukohtruumid, $aadress_loc);
array_push($asukohtruumid, $eluruumid);
}
$li_length = count($data);
for ($i = 1; $i < $li_length; $i++){
array_push($aadress_mas,($data[$i][$asukohtruumid[0]])); // left column
array_push($ruumid_mas,($data[$i][$asukohtruumid[1]])); // right column
}
Array
(
[0] => Harju maakond, Kernu vald, Laitse küla, Lossi tee 6
[1] =>
[2] => // 0;2 is the length of the first element
)
Array
(
[0] => E/1;E/2;E/3;E/4;E/5;E/6;M/7/Kontoriruumid;E/8;E/9
[1] => E/10;E/11;E/12;E/13;E/14;E/15;E/16;E/17;E/18;E/19
[2] => E/20;E/21;E/22;E/23;E/24
so I need to merge these 0;2 elements from another array to one string
and repeat the process with another elements from aadress array.
Here is the array with the diffrences but I don't know how can I use it to do what I need.
Sorry for not so good english.

Hopefully I understand the question but I think you are looking for this:
foreach ($Reader as $Row)
{
echo $row[0].' - '.$row[7];
// OR
echo $row['Aadress'].' - '.$row['Ehitise osad'];
}
I am not sure which one will work in your situation.

PHP looping through huge text file is very slow, can you improve?

The data contained in the text file (actually a .dat) looks like:
LIN*1234*UP*abcde*33*0*EA
LIN*5678*UP*fghij*33*0*EA
LIN*9101*UP*klmno*33*23*EA
There are actually over 500,000 such lines in the file.
This is what I'm using now:
//retrieve file once
$file = file_get_contents('/data.dat');
$file = explode('LIN', $file);
...some code
foreach ($list as $item) { //an array containing 10 items
foreach($file as $line) { //checking if these items are on huge list
$info = explode('*', $line);
if ($line[3] == $item[0]) {
...do stuff...
break; //stop checking if found
}
}
}
The problem is it runs way too slow - about 1.5 seconds of each iteration. I separately confirmed that it is not the '...do stuff...' that is impacting speed. Rather, its the search for the correct item.
How can I speed this up? Thank you.

If each item is on its own line, instead of loading the whole thing in memory, it might be better to use fgets() instead:
$f = fopen('text.txt', 'rt');
while (!feof($f)) {
$line = rtrim(fgets($f), "\r\n");
$info = explode('*', $line);
// etc.
}
fclose($f);
PHP file streams are buffered (~8kB), so it should be decent in terms of performance.
The other piece of logic can be rewritten like this (instead of iterating the file multiple times):
if (in_array($info[3], $items)) // look up $info[3] inside the array of 10 things
Or, if $items is suitably indexed:
if (isset($items[$info[3]])) { ... }

file_get_contents loads the whole file into memory as an array & then your code acts on it. Adapting this sample code from the official PHP fgets documentation should work better:
$handle = #fopen("test.txt", "r");
if ($handle) {
while (($buffer = fgets($handle, 4096)) !== false) {
$file_data = explode('LIN', $buffer);
foreach($file_data as $line) {
$info = explode('*', $line);
$info = array_filter($info);
if (!empty($info)) {
echo '<pre>';
print_r($info);
echo '</pre>';
}
}
}
if (!feof($handle)) {
echo "Error: unexpected fgets() fail\n";
}
fclose($handle);
}
The output of the above code using your data is:
Array
(
[1] => 1234
[2] => UP
[3] => abcde
[4] => 33
[6] => EA
)
Array
(
[1] => 5678
[2] => UP
[3] => fghij
[4] => 33
[6] => EA
)
Array
(
[1] => 9101
[2] => UP
[3] => klmno
[4] => 33
[5] => 23
[6] => EA
)
But still unclear about your missing code since the line that states:
foreach ($list as $item) { //an array containing 10 items
That seems to be another real choke point.

When you do file_get_contents, it loads the stuff into the memory so you can only imagine how resource intensive the process may be. Not to mention you have a nested loop, that's (O)n^2
You can either split the file if possible or use fopen, fgets and fclose to read them line by line.
If I was you, I’d use another language like C++ or Go if I really need the speeds.

php function to split an array at each blank line?

I'm building a script which will open a saved text file, export the contents to an array and then dump the contents in a database. So far I've been able to get the file upload working quite happily and can also open said file.
The trouble I'm having is the contents of the file are variable, they have a fixed structure but the contents will change every time. The structure of the file is that each "section" is seperated by a blank line.
I've used php's file() to get an array ... I'm not sure if there's a way to then split that array up every time it comes across a blank line?
$file = $target_path;
$data = file($file) or die('Could not read file!');
Example output:
[0] => domain.com
[1] => # Files to be checked
[2] => /www/06.php
[3] => /www/08.php
[4] =>
[5] => domain2.com
[6] => # Files to be checked
[7] => /cgi-bin/cache.txt
[8] => /cgi-bin/log.txt
[9] =>
[10] => domain3.com
[11] => # Files to be checked
[12] => /www/Content.js
[13] =>
I know that Field 0 and 1 will be constants, they will always be a domain name then that hash line. The lines thereafter could be anywhere between 1 line and 1000 lines.
I've looked at array_chunk() which is close to what I want but it works on a numerical value, what would be good if there was something which would work on a specified value (like a new line, or a comma or something of that sort!).
Lastly, apologies if this has been answered previously. I've searched the usual places a few times for potential solutions.
Hope you can help :)
Foxed

I think what you're looking for is preg_split. If you just split on a carriage return, you might miss lines that just have spaces or tabs.
$output = array(...);//what you just posted
$string_output = implode('', $output);
$array_with_only_populated_lines = preg_split('`\n\W+`', $string_output);

You could just do something like this. You could change it also to read the file in line-by-line rather than using file(), which would use less memory, which might be important if you use larger files.
$handle = fopen('blah', 'r');
$blocks = array();
$currentBlock = array();
while (!feof($handle)) {
$line = fgets($handle);
if (trim($line) == '') {
if ($currentBlock) {
$blocks[] = $currentBlock;
$currentBlock = array();
}
} else {
$currentBlock[] = $line;
}
}
fclose($handle);
//if is anything left
if ($currentBlock) {
$blocks[] = $currentBlock;
}
print_r($blocks);

Have you tried split('\n\n', $file);
?

You could do it by splitting first on the blank line and then on new lines, e.g.:
$file = $target_path;
$fileData = file_get_contents($file) or die('Could not read file!');
$parts = explode("\n\n", $data);
$data = array();
foreach ($parts as $part) {
$data[] = explode("\n", $part);
}
You could also use preg_split() in place of the first explode() with a regex to sp.lit on lines containing just whitespace (e.g. \s+)

I would use the function preg_grep() to reduce the resulting array:
$array = preg_grep('/[^\s]/', $array);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Parsing two files, and comparing strings - php

You have a typo in the inner loop, you are reading firstfile and should be reading second file. In addition, after exiting inner loop you would want to re-wind the secondfile pointer back to the beginning.

Related

Reading data from a text file using Strpos & Substr to remove commas between and a while loop to repeat for each row

Handle csv with php in for loop and write them back

Counting blank spaces till next string in PHP array

PHP looping through huge text file is very slow, can you improve?

php function to split an array at each blank line?

Categories

Resources