Parsing a formatted text file in PHP - php

Is there an easy way to parse the following data that I will post below. The data comes from the web.
I was using the $rows = explode("\n", $txt_file); then the $parts = explode('=', $line_of_text); to get the key name and values. However, I don't know how to handle the extra information that I do not want.
Additionally, I do not know how to get rid of the extra spaces. The file seems to be made for some kind of easy parsing. I have looked all over this site to find a solution. However, this data is quite different than the examples I have found on this site.
# This file holds all the timelines available at this time.
# All lines starting with # is ignored by parser...
#
STARTINFO
description = Rubi-Ka 2
displayname = Rimor (Rubi-Ka 2)
connect = cm.d2.funcom.com
ports = 7502
url =
version = 18.5.4
ENDINFO
STARTINFO
description = Rubi-Ka 1
displayname = Atlantean (Rubi-Ka 1)
connect = cm.d1.funcom.com
ports = 7501
url =
version = 18.5.4
ENDINFO

You can use the trim function to get rid of the whitespace.
To only keep the columns you want, you can store their keys in an array, and make a check against it when parsing.
Here's an example (albeit rather verbose).
<?
$lines = explode("\n", $data);
$result = array();
$count = 0;
// an array of the keys we want to keep
// I have the columns as keys rather then values for faster lookup
$cols_to_keep = array( 'url'=>null, 'description'=>null, 'ports'=>null, 'displayname' => null);
foreach($lines as $line)
{
//skip comments and empty lines
if(empty($line) || $line[0] == '#')
{ continue; }
//if we start a new block, initalize result array for it
if(trim($line) == 'STARTINFO')
{
$result[$count] = array();
continue;
}
// if we reach ENDINFO increment count
if(trim($line) == 'ENDINFO')
{
$count++;
continue;
}
//here we can split into key - value
$parts = explode('=', $line);
//and if it's in our cols_to_keep, we add it on
if(array_key_exists(trim($parts[0]), $cols_to_keep))
{ $result[$count][ trim($parts[0]) ] = trim( $parts[1] ); }
}
print_r($result);
?>

Related

php Compare two text files and output NON matching records

I have two text files of data the first file has 30 lines of data and matches with 30 lines in the second text file, but in addition the first text file has two additional lines that are added as the operator uploads file to the directory I want to find the non matching lines and out put them to be used in the same script as a mailout.
I am trying to use this code, which outputs the contents of the two files to screen.
<?php
if ($file1 = fopen(".data1.txt", "r")) {
while(!feof($file1)) { $textperline = fgets($file1);
echo $textperline;
echo "<br>";}
if ($file2 = fopen(".data.txt", "r")) {
while(!feof($file2)) {$textperline1 = fgets($file2);
echo $textperline1;
echo "<br>";}
fclose($file1);
fclose($file2);
}}
?>
But it outputs the whole list of data, can anyone help listingout only NON matching lines?
attached output of the two files from my code
I want to output only lines that are in file2 but not in file1
My suggestion would be to read each file into an array (one line = one element) and then use array_diff to compare them. Unless you have millions of lines, this approach is the easiest.
To reuse your code, this is how you can read the 2 files into two arrays
$list1 = [];
$list2 = [];
if ($file1 = fopen(".data1.txt", "r")) {
while (!feof($file1)) {
$list1[] = trim(fgets($file1));
}
fclose($file1);
}
if ($file2 = fopen(".data.txt", "r")) {
while (!feof($file2)) {
$list2[] = trim(fgets($file2));
}
fclose($file2);
}
If the files are small and you can read them in one go, you can also use a simplified syntax.
$list1 = explode(PHP_EOL, file_get_contents(".data1.txt"));
$list2 = explode(PHP_EOL, file_get_contents(".data.txt"));
Then, no matter which method you chose, you can compare them as follows
$comparison = array_diff($list2, $list1);
foreach ($comparison as $line) {
echo $line."<br />";
}
This will only output the lines of the second array that are not present in the first one.
Make sure that the one with the additional lines is the first argument of array_diff
ASSUMPTION
Both files are not huge and you can read the whole content into the memory at once. According to this, you can put following code to the top:
$file1 = "./data1.txt";
$file2 = "./data2.txt";
$linesOfFile1 = file($file1);
$linesOfFile2 = file($file2);
$newLinesInFile2 = [];
There are a couple cases, which you did not mention in your question.
CASE 1
New lines are only appended to the secode file file2. The solution for this case is the easiest one:
$numberOfRowsFile1 = count($linesOfFile1);
$numberOfRowsFile2 = count($linesOfFile1);
if($numberOfRowsFile2 > $numberOfRowsFile1)
{
$newLinesInFile2 = array_slice($linesOfFile2, $numberOfRowsFile1);
}
CASE 2
The lines with the same content may have different position in each file. Duplicate lines within the same file are ignored.
Furthermore the case sensitivity may play a role. That's why the content of each line should be hashed to make a simpler comparison. For both case sensitive and insensitive comparison the following function is needed:
function buildHashedMap($array, &$hashedMap, $caseSensitive = true)
{
foreach($array as $line)
{
$line = !$caseSensitive ? strtolower($line) : $line;
$hash = md5($line);
$hashedMap[$hash] = $line;
}
}
Case sensitive comparison
$hashedLinesFile1 = [];
buildHashedMap($linesOfFile1, $hashedLinesFile1);
$hashedLinesFile2 = [];
buildHashedMap($linesOfFile2, $hashedLinesFile2);
$newLinesInFile2 = array_diff_key($hashedLinesFile2, $hashedLinesFile1);
Case INSENSITIVE comparison
$caseSensitive = false;
$hashedLinesFile1 = [];
buildHashedMap($linesOfFile1, $hashedLinesFile1, $caseSensitive);
$hashedLinesFile2 = [];
buildHashedMap($linesOfFile2, $hashedLinesFile2, $caseSensitive);
$newLinesInFile2 = array_diff_key($hashedLinesFile2, $hashedLinesFile1);

multi-dimensional array possibly

I have two files that I need opened, I'm using php file to read them
$lines = file('/home/program/prog_conf.txt');
foreach ($lines as $line) {
$rows = preg_split('/\s+/', $line);
Followed by:
$lines = file('/home/domain/public_html/base/file2.cfg');
foreach ($lines as $line) {
$rows = preg_split('/=/', $line);
As I work with these two files, I need to pull info from the second one, which I seperated by =, however, I'm not sure this is the best thing to do. I wanted to add data checking from the database. The db details are in the second file like so:
dbname = databasename
dbuser = databaseuser
dbpass = databasepassword
If I echo the $rows[2], I get everything all the information I need on a single line, not on seperate lines. Meaning:
databasename databaseuser databasepassword
How do I split the information up so I can use the entries one by one?
How about:
$lines = file('/home/domain/public_html/base/file2.cfg');
$all_parts = array()
foreach ($lines as $line) {
//explode pulls apart a string based on the first value, so you could change that
//to a '=' if need be
array_merge($all_parts,explode(' ', $line));
}
This would get you all the parts of the file, one at a time, into an array. Which is what I think you wanted.
If not, just explode as needed
Maybe this aproach helps:
First as i see your second file has multiple lines, so what would do is something like this:
Assumin that every key as "db" in common we can do something like this.
$file = fopen("/home/domain/public_html/base/file2.cfg", "rb");
$contents = stream_get_contents($handle); // This function return better performance if the file isn't too large.
fclose($file);
// Assuming this is your return from the file
$contents = 'dbname = databasename dbuser = databaseuser dbpass = databasepassword';
$rows = preg_split('/db+/', $contents); // Splinting keys "db"
$result = array();
foreach($rows as $row){
$temp = preg_replace("/\s+/", '', $row); // Removing extract white spaces
$temp = preg_split("/=/", $temp); // Splinting by "="
$result[] = $temp[1]; // Getting the value only
}
var_dump ($result);
I hope this help you can try this code maybe with little modifications but works.

Insert multiple arrays

I'm trying to grab all values from my inputs and insert them together, so for each value #1 in first array, insert value #1 in second and third. This is kind of what it looks like:
$lines = explode(PHP_EOL, $_POST['links']);
$keywords = explode(PHP_EOL, $_POST['keywords']);
$violationtypes = explode(PHP_EOL, $_POST['keywords']);
Those inputs are regular, but they may be 1 or 500, I honestly don't know. I used to handle this while there was one input like this:
foreach($lines as $line)
{
if (!empty($line))
{
if (false === strpos($line, '://'))
{
$line = 'http://' . $line;
}
mysql_query("INSERT INTO links (ClientEmail,Links) VALUES ('$Emailvalue', '$line')");
}
}
However, I can't pull this one with 3 arrays. Is there a better way?
PS The check for empty lines is so that I don't add empty values into the database, and the other one is checking if http:// is there, and adds it if its not. That check is only for $lines, other inputs don't need any check
Use the index in one array to fetch the corresponding elements of the other arrays:
foreach ($lines as $i => $line) {
if (!empty($line)) {
$keyword = $keywords[$i];
$violation = $violationtypes[$i];
// Now insert $line, $keyword, and $violation into DB
}
}

Be sure to have unique array entry

I have a file which contains something like :
toto;145
titi;7
tata;28
I explode this file to have an array.
I am able to display the data with that code :
foreach ($lines as $line_num => $line) {
$tab = explode(";",$line);
//erase return line
$tab[1]=preg_replace('/[\r\n]+/', "", $tab[1]);
echo $tab[0]; //toto //titi //tata
echo $tab[1]; //145 //7 //28
}
I want to be sure that data contained in each $tab[0] and $tab[1] is unique.
For example, I want a "throw new Exception" if file is like :
toto;145
titi;7
tutu;7
tata;28
or like :
toto;145
tata;7
tata;28
How can I do that ?
Convert your file to array with file(), and convert to associative array with additional duplication checking.
$lines = file('file.txt', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
$tab = array();
foreach ($lines as $line) {
list($key, $val) = explode(';', $line);
if (array_key_exists($key, $tab) || in_array($val, $tab)) {
// throw exception
} else {
$tab[$key] = $val;
}
}
Store them as key => value pairs in an array, and check whether each key or value already exists in your array as you are looping through the file. You can check for an existing key with array_key_exists and an existing value with in_array.
One simple is using array_unique, save the parts (tab[0] and tab[1]) into two separate arrays after you explode, name them for example $col1 and $col2 and then, you could do this simple test:
<?php
if (count(array_unique($col1)) != count($col1))
echo "arrays are different; not unique";
?>
PHP will turn your array parts into unique, if duplicated entrys exist, so if the size of the new array differs from the original, it means that it was not unique.
//contrived file contents
$file_contents = "
toto;145
titi;7
tutu;7
tata;28";
//split into lines and set up some left/right value trackers
$lines = preg_split('/\n/', trim($file_contents));
$left = $right = array();
//split each line into two parts and log left and right part
foreach($lines as $line) {
$splitter = explode(';', preg_replace('/\r\n/', '', $line));
array_push($left, $splitter[0]);
array_push($right, $splitter[1]);
}
//sanitise left and right parts into just unique entries
$left = array_unique($left);
$right = array_unique($right);
//if we end up with fewer left or right entries than the number of lines, error...
if (count($left) < count($lines) || count($right) < count($lines))
die('error');
Use associative arrays with keys "toto", "tata" etc.
To check whether a key exists you can use array_key_exists or isset.
BTW. Instead of preg_replace('/[\r\n]+/', "", $tab[1]), try trim (or even rtrim).
While you're traversing the array add the values to an existing array, i.e. placeholder, which will be used to check if the value exists or not via in_array().
<?php
$lines = 'toto;145 titi;7 tutu;7 tata;28';
$results = array();
foreach ($lines as $line_num => $line) {
$tab = explode(";",$line);
//erase return line
$tab[1]=preg_replace('/[\r\n]+/', "", $tab[1]);
if(!in_array($tab[0]) && !in_array($tab[1])){
array_push($results, $tab[0], $tab[1]);
}else{
echo "value exists!";
die(); // Remove/modify for different exception handling
}
}
?>

how to find out if csv file fields are tab delimited or comma delimited

how to find out if csv file fields are tab delimited or comma delimited. I need php validation for this. Can anyone plz help. Thanks in advance.
It's too late to answer this question but hope it will help someone.
Here's a simple function that will return a delimiter of a file.
function getFileDelimiter($file, $checkLines = 2){
$file = new SplFileObject($file);
$delimiters = array(
',',
'\t',
';',
'|',
':'
);
$results = array();
$i = 0;
while($file->valid() && $i <= $checkLines){
$line = $file->fgets();
foreach ($delimiters as $delimiter){
$regExp = '/['.$delimiter.']/';
$fields = preg_split($regExp, $line);
if(count($fields) > 1){
if(!empty($results[$delimiter])){
$results[$delimiter]++;
} else {
$results[$delimiter] = 1;
}
}
}
$i++;
}
$results = array_keys($results, max($results));
return $results[0];
}
Use this function as shown below:
$delimiter = getFileDelimiter('abc.csv'); //Check 2 lines to determine the delimiter
$delimiter = getFileDelimiter('abc.csv', 5); //Check 5 lines to determine the delimiter
P.S I have used preg_split() instead of explode() because explode('\t', $value) won't give proper results.
UPDATE: Thanks for #RichardEB pointing out a bug in the code. I have updated this now.
Here's what I do.
Parse the first 5 lines of a CSV file
Count the number of delimiters [commas, tabs, semicolons and colons] in each line
Compare the number of delimiters in each line. If you have a properly formatted CSV, then one of the delimiter counts will match in each row.
This will not work 100% of the time, but it is a decent starting point. At minimum, it will reduce the number of possible delimiters (making it easier for your users to select the correct delimiter).
/* Rearrange this array to change the search priority of delimiters */
$delimiters = array('tab' => "\t",
'comma' => ",",
'semicolon' => ";"
);
$handle = file( $file ); # Grabs the CSV file, loads into array
$line = array(); # Stores the count of delimiters in each row
$valid_delimiter = array(); # Stores Valid Delimiters
# Count the number of Delimiters in Each Row
for ( $i = 1; $i < 6; $i++ ){
foreach ( $delimiters as $key => $value ){
$line[$key][$i] = count( explode( $value, $handle[$i] ) ) - 1;
}
}
# Compare the Count of Delimiters in Each line
foreach ( $line as $delimiter => $count ){
# Check that the first two values are not 0
if ( $count[1] > 0 and $count[2] > 0 ){
$match = true;
$prev_value = '';
foreach ( $count as $value ){
if ( $prev_value != '' )
$match = ( $prev_value == $value and $match == true ) ? true : false;
$prev_value = $value;
}
} else {
$match = false;
}
if ( $match == true ) $valid_delimiter[] = $delimiter;
}//foreach
# Set Default delimiter to comma
$delimiter = ( $valid_delimiter[0] != '' ) ? $valid_delimiter[0] : "comma";
/* !!!! This is good enough for my needs since I have the priority set to "tab"
!!!! but you will want to have to user select from the delimiters in $valid_delimiter
!!!! if multiple dilimiter counts match
*/
# The Delimiter for the CSV
echo $delimiters[$delimiter];
There is no 100% reliable way to detemine this. What you can do is
If you have a method to validate the fields you read, try to read a few fields using either separator and validate against your method. If it breaks, use another one.
Count the occurrence of tabs or commas in the file. Usually one is significantly higher than the other
Last but not least: Ask the user, and allow him to override your guesses.
I'm just counting the occurrences of the different delimiters in the CSV file, the one with the most should probably be the correct delimiter:
//The delimiters array to look through
$delimiters = array(
'semicolon' => ";",
'tab' => "\t",
'comma' => ",",
);
//Load the csv file into a string
$csv = file_get_contents($file);
foreach ($delimiters as $key => $delim) {
$res[$key] = substr_count($csv, $delim);
}
//reverse sort the values, so the [0] element has the most occured delimiter
arsort($res);
reset($res);
$first_key = key($res);
return $delimiters[$first_key];
In my situation users supply csv files which are then entered into an SQL database. They may save an Excel Spreadsheet as comma or tab delimited files. A program converting the spreadsheet to SQL needs to automatically identify whether fields are tab separated or comma
Many Excel csv export have field headings as the first line. The heading test is unlikely to contain commas except as a delimiter. For my situation I counted the commas and tabs of the first line and use that with the greater number to determine if it is csv or tab
Thanks for all your inputs, I made mine using your tricks : preg_split, fgetcsv, loop, etc.
But I implemented something that was surprisingly not here, the use of fgets instead of reading the whole file, way better if the file is heavy!
Here's the code :
ini_set("auto_detect_line_endings", true);
function guessCsvDelimiter($filePath, $limitLines = 5) {
if (!is_readable($filePath) || !is_file($filePath)) {
return false;
}
$delimiters = array(
'tab' => "\t",
'comma' => ",",
'semicolon' => ";"
);
$fp = fopen($filePath, 'r', false);
$lineResults = array(
'tab' => array(),
'comma' => array(),
'semicolon' => array()
);
$lineIndex = 0;
while (!feof($fp)) {
$line = fgets($fp);
foreach ($delimiters as $key=>$delimiter) {
$lineResults[$key][$lineIndex] = count (fgetcsv($fp, 1024, $delimiter)) - 1;
}
$lineIndex++;
if ($lineIndex > $limitLines) break;
}
fclose($fp);
// Calculating average
foreach ($lineResults as $key=>$entry) {
$lineResults[$key] = array_sum($entry)/count($entry);
}
arsort($lineResults);
reset($lineResults);
return ($lineResults[0] !== $lineResults[1]) ? $delimiters[key($lineResults)] : $delimiters['comma'];
}
I used #Jay Bhatt's solution for finding out a csv file's delimiter, but it didn't work for me, so I applied a few fixes and comments for the process to be more understandable.
See my version of #Jay Bhatt's function:
function decide_csv_delimiter($file, $checkLines = 10) {
// use php's built in file parser class for validating the csv or txt file
$file = new SplFileObject($file);
// array of predefined delimiters. Add any more delimiters if you wish
$delimiters = array(',', '\t', ';', '|', ':');
// store all the occurences of each delimiter in an associative array
$number_of_delimiter_occurences = array();
$results = array();
$i = 0; // using 'i' for counting the number of actual row parsed
while ($file->valid() && $i <= $checkLines) {
$line = $file->fgets();
foreach ($delimiters as $idx => $delimiter){
$regExp = '/['.$delimiter.']/';
$fields = preg_split($regExp, $line);
// construct the array with all the keys as the delimiters
// and the values as the number of delimiter occurences
$number_of_delimiter_occurences[$delimiter] = count($fields);
}
$i++;
}
// get key of the largest value from the array (comapring only the array values)
// in our case, the array keys are the delimiters
$results = array_keys($number_of_delimiter_occurences, max($number_of_delimiter_occurences));
// in case the delimiter happens to be a 'tab' character ('\t'), return it in double quotes
// otherwise when using as delimiter it will give an error,
// because it is not recognised as a special character for 'tab' key,
// it shows up like a simple string composed of '\' and 't' characters, which is not accepted when parsing csv files
return $results[0] == '\t' ? "\t" : $results[0];
}
I personally use this function for helping automatically parse a file with PHPExcel, and it works beautifully and fast.
I recommend parsing at least 10 lines, for the results to be more accurate. I personally use it with 100 lines, and it is working fast, no delays or lags. The more lines you parse, the more accurate the result gets.
NOTE: This is just a modifed version of #Jay Bhatt's solution to the question. All credits goes to #Jay Bhatt.
When I output a TSV file I author the tabs using \t the same method one would author a line break like \n so that being said I guess a method could be as follows:
<?php
$mysource = YOUR SOURCE HERE, file_get_contents() OR HOWEVER YOU WISH TO GET THE SOURCE;
if(strpos($mysource, "\t") > 0){
//We have a tab separator
}else{
// it might be CSV
}
?>
I Guess this may not be the right manner, because you could have tabs and commas in the actual content as well. It's just an idea. Using regular expressions may be better, although I am not too clued up on that.
you can simply use the fgetcsv(); PHP native function in this way:
function getCsvDelimeter($file)
{
if (($handle = fopen($file, "r")) !== FALSE) {
$delimiters = array(',', ';', '|', ':'); //Put all that need check
foreach ($delimiters AS $item) {
//fgetcsv() return array with unique index if not found the delimiter
if (count(fgetcsv($handle, 0, $item, '"')) > 1) {
$delimiter = $item;
break;
}
}
}
return (isset($delimiter) ? $delimiter : null);
}
Aside from the trivial answer that c sv files are always comma-separated - it's in the name, I don't think you can come up with any hard rules. Both TSV and CSV files are sufficiently loosely specified that you can come up with files that would be acceptable as either.
A\tB,C
1,2\t3
(Assuming \t == TAB)
How would you decide whether this is TSV or CSV?
You also can use fgetcsv (http://php.net/manual/en/function.fgetcsv.php) passing it a delimiter parameter. If the function returns false it means that the $delimiter parameter wasn't the right one
sample to check if the delimiter is ';'
if (($data = fgetcsv($your_csv_handler, 1000, ';')) !== false) { $csv_delimiter = ';'; }
How about something simple?
function findDelimiter($filePath, $limitLines = 5){
$file = new SplFileObject($filePath);
$delims = $file->getCsvControl();
return $delims[0];
}
This is my solution.
Its works if you know how many columns you expect.
Finally, the separator character is the $actual_separation_character
$separator_1=",";
$separator_2=";";
$separator_3="\t";
$separator_4=":";
$separator_5="|";
$separator_1_number=0;
$separator_2_number=0;
$separator_3_number=0;
$separator_4_number=0;
$separator_5_number=0;
/* YOU NEED TO CHANGE THIS VARIABLE */
// Expected number of separation character ( 3 colums ==> 2 sepearation caharacter / row )
$expected_separation_character_number=2;
$file = fopen("upload/filename.csv","r");
while(! feof($file)) //read file rows
{
$row= fgets($file);
$row_1_replace=str_replace($separator_1,"",$row);
$row_1_length=strlen($row)-strlen($row_1_replace);
if(($row_1_length==$expected_separation_character_number)or($expected_separation_character_number==0)){
$separator_1_number=$separator_1_number+$row_1_length;
}
$row_2_replace=str_replace($separator_2,"",$row);
$row_2_length=strlen($row)-strlen($row_2_replace);
if(($row_2_length==$expected_separation_character_number)or($expected_separation_character_number==0)){
$separator_2_number=$separator_2_number+$row_2_length;
}
$row_3_replace=str_replace($separator_3,"",$row);
$row_3_length=strlen($row)-strlen($row_3_replace);
if(($row_3_length==$expected_separation_character_number)or($expected_separation_character_number==0)){
$separator_3_number=$separator_3_number+$row_3_length;
}
$row_4_replace=str_replace($separator_4,"",$row);
$row_4_length=strlen($row)-strlen($row_4_replace);
if(($row_4_length==$expected_separation_character_number)or($expected_separation_character_number==0)){
$separator_4_number=$separator_4_number+$row_4_length;
}
$row_5_replace=str_replace($separator_5,"",$row);
$row_5_length=strlen($row)-strlen($row_5_replace);
if(($row_5_length==$expected_separation_character_number)or($expected_separation_character_number==0)){
$separator_5_number=$separator_5_number+$row_5_length;
}
} // while(! feof($file)) END
fclose($file);
/* THE FILE ACTUAL SEPARATOR (delimiter) CHARACTER */
/* $actual_separation_character */
if ($separator_1_number==max($separator_1_number,$separator_2_number,$separator_3_number,$separator_4_number,$separator_5_number)){$actual_separation_character=$separator_1;}
else if ($separator_2_number==max($separator_1_number,$separator_2_number,$separator_3_number,$separator_4_number,$separator_5_number)){$actual_separation_character=$separator_2;}
else if ($separator_3_number==max($separator_1_number,$separator_2_number,$separator_3_number,$separator_4_number,$separator_5_number)){$actual_separation_character=$separator_3;}
else if ($separator_4_number==max($separator_1_number,$separator_2_number,$separator_3_number,$separator_4_number,$separator_5_number)){$actual_separation_character=$separator_4;}
else if ($separator_5_number==max($separator_1_number,$separator_2_number,$separator_3_number,$separator_4_number,$separator_5_number)){$actual_separation_character=$separator_5;}
else {$actual_separation_character=";";}
/*
if the number of columns more than what you expect, do something ...
*/
if ($expected_separation_character_number>0){
if ($separator_1_number==0 and $separator_2_number==0 and $separator_3_number==0 and $separator_4_number==0 and $separator_5_number==0){/* do something ! more columns than expected ! */}
}
If you have a very large file example in GB, head the first few line, put in a temporary file. Open the temporary file in vi
head test.txt > te1
vi te1
Easiest way I answer this is open it in a plain text editor, or in TextMate.

Categories