Detecting File Format in PHP

Detecting File Format in PHP - php

I am building a tool that will accept a CSV or tab-delimited file, which will then be parsed and the data databased.
The uploaded file can be CSV or tab-delimited.
I came up with a workable solution (below) for detecting what format the file might be in and would like to know if there is a better way to solve this and/or how any of you out there have solved the same problem.
Thanks
<?php
$csv_comma='Fruit,Color
Apple,"Red,Green"
Tomato,"Red,Green"
Banana,Yellow
Tangerine,Orange
';
$csv_semi_colon='Fruit;Color
Apple;"Red,Green"
Tomato;"Red,Green"
Banana;Yellow
Tangerine;Orange
';
$tab_delimited='Fruit Color
Apple Red,Green
Tomato Red,Green
Banana Yellow
Tangerine Orange';
$fileArr = array($csv_comma,$csv_semi_colon,$tab_delimited);
foreach($fileArr as $file){
if(preg_match('/^(.+),(.+)/',trim($file))){
echo "CSV with comma separator";
}
if(preg_match('/^(.+);(.+)/',trim($file))){
echo "CSV with semi colon separator";
}
if(preg_match('/^(.+)\t(.+)/',trim($file))){
echo "Tab delimited";
}
}

Well csv has this prety much implemented.
Default for csv is , but with sep= you could specify an other seperator.
You could just implement this as csv. So you have a default of , but if the sep is defined you use that.
You file could look like:
apple, orange, tomato
or
sep=;
apple; orange; tomato
So if the first line starts with sep, it is an "option" line otherwise there are values. For tab you do sep=\t
Now users can define there own seperator and no guessing any more
After some comments of CBroe of easy to use for the user there could be some changes. csv only accepts one charachter as septerator so that system could be use like the above. cvs editor (like excel) will handle that for the user
If the user uses the tab it won't be a csv file but a .txt (for example). So you could change the default according to the file given.
Also I want to add, already pointed out in the comments, if you want to guess you will hit a point where it will occure it is wrong.
I don't know the setup of the files but csv lines need to be the same length (according to my memory). So what you could do is read out the first x lines. and use every seperator.
After that you check which lines lengths are the same, most likely that is your seperator (again guessing)

You can use this kind of pattern to check the csv structure and determine the separator:
if (preg_match('^(?:("[^"]++"|[^,;\t\n]++)(?<sep>[,\t;])(?1)(?:\n|$))++$', $csv_comma, $match))
print_r($match['sep']);

Related

How to add new column to a flat file using PHP?

Quick update: The reason I need this solution is that this one php file is used to expand the flat file for about hundred users (that all use the same php file, but have their own flat files)
SOLUTION:
I worked with this one more day, rephrased the question and got a really great answer. I add it here for future help for others:
$content = file_get_contents("newstest.db");
$content = preg_replace('/(^ID:.*\S)/im', '$1value4:::', $content);
$content = preg_replace('/(^\d+.*\S)/im', '$1:::', $content);
file_put_contents("newstest.db", $content);
The original content of the flat file used when testing the code was:
ID:::value1:::value2:::value3:::
1:::My:::first:::line:::
2:::My:::second:::line:::
3:::Your:::third:::line:::
ORIGINAL QUESTION:
I have a PHP script I am trying to modify. Being a PHP newbie, and have searched both here and on Google without finding a solution, I ask here.
I need to add more values (columns) in the flat file, automatically if the "column" does not exist from before.
Because this one PHP file is shared with many users (each with their own flat file), I need a way to automatically add new "columns" in their flat files if the column does not exist. Doing it manually is very time consuming, and I bet there is an easy way.
INFO:
The flat file is named "newstest.db"
The flat file has this layout:
id:::title:::short:::long:::author:::email:::value1:::value2:::value3:::
So the divider is :::
I understand the basics, that I need to add for instance "value4:::" after "value3:::" in the first line of the news.db, then add ::: to the other existing lines to update all lines and prepare for the new "value4"
Today the php uses this to connect to the flat file:
($filesource is the path to the flat file including it's name. Unique for each user.)
$connect_to_file = connect_pb ($filesource);
And to write to the file I use:
insert_pb($filesource,"$new_id:::$title:::$short:::$long:::$author:::$email:::$value1:::::::::");
(As you see in this case value 2 and 3 is not used in this case, but are in others.)
QUESTION:
Is there a quick/ existing php code to use to add a new column if it doesn't already exist? Or do I need to make the php code for this specific task?
I understand that the code must do something along:
If "value4" does not exist in line 0 in $filesource
then add "value4:::" at the end of line 0,
and for each of the other lines add ":::" at the end.
I don't know where to start, but I have tried for some hours.
I understand this:
update_pb(pathtofiletosaveto,"id","x == (ID of news)","value in first line","value to add");
But I don't know how to make an if statement as in 1) above, neither how to update the line 0 in the flat file to add "value4:::" at the end etc.
MY CODE (does not work as intended):
OR, may be I need to read only line 1 in the file (newstest.db), and then exchange that with a new line if "value4" is not in line 1?
A suggestion, but I don't know how do all:
(It's probably full of errors, as I have tried to read up and find examples and combining code.)
<?php
// specify the file
$file_source="newstest.db";
// get the content of the file
$newscontent = file($file_source, true);
$lines ='';
// handle the content, add "value4:::" and ":::" to the lines
foreach ($newscontent as $line_num => $linehere) {
// add "value4:::" at the end of first line only, and put it in memory
if($line_num[0]) {$lines .= $linehere.'value4:::';}
else {
// then add ":::" to the other lines and add them to memory
$lines .= $linehere.':::';
}
// echo results just to see what is going on
//echo 'Line nr'.$line_num.':<br />'.$lines.'<br /><br />';
}
// add
// to show the result
echo "Here is the result:<br /><br />".$lines."<br /><br />";
//Write new content to $file_source
$f = fopen($file_source, 'w');
fwrite($f,$lines);
fclose($f);
echo "done updating database flat file";
?>
This ALMOST works...
But it does NOT add "value4:::" to the end of the first line,
and it does not add ":::" to the end of the next lines, but to the beginning...
So a couple of questions remains:
1) How can I search in line 0 after "value4", and then write "value4:::" at the end of the line?
2) How can I add ":::" at the end of each line, and not in the beginning?
I kindly ask you to either help me with this.

Do you absolutely have to use PHP for this task? It seems like something you only need to do once, and is much easier to do in a different way.
For example, if you have a *nix shell with sed, sed -i 's/$/:::/' <file> will do that task for you.

Concatanate RTF files with PHP withouth header

I have some RTF files generated by users with Microsoft Word. I need to be able to concatenate these files, and the result file should still be readable by libreoffice. I'm using libreoffice in order to convert the result file into a PDF file.
In order to concatenate two files, my application remove the last character of the first file and the first one of my other file. The files headers are not removed (I'm not speaking about page header).
For some reason, libreoffice do not like the headers inserted by Microsoft Word. But it works fine if I open these files with Wordpad and save them.
Another way to remove these headers is to convert these files into RTF before I concatenate them. This way i can convert into PDF, but libreoffice make a serious mess with my tabs when i convert my files to RTF.
So how can I remove the headers through PHP withouth messing with tabs ? Or do you have another way to get to the same result ?
Edit :
In a nutshell, I must be able to concanate these files and that libreoffice could open it. And my tabs must still display nicely in Microsoft Word.
As you can guess, users don't want to use Wordpad. And my customer's IT department has to comply to that wish ( office politics).
UPDATE :
I have to do the merging first, because of business rules. The files are merged, then my users can modify it using Word (no problems here). Then they ask their boss to validate it. If the boss agree to validate, the RTF file become a PDF file.
UPDATE 2 :
I have a begenning of a solution. If the RTF file start by plain text or a picture, you have to remove everything until you get \pard. But this does not work if you file start by a tab.
UPDATE 3 :
If you want to support tab too, you have to remove evrything until you get \pard or \trowd. I'm going to post the total solution once i get a working code. This will works fine as long you don't need colours and that all yours files use the same font (because we don't remove the RTF headers of the first file).

If the limitations with the 'pure RTF' approach come back to bite you, you could use LibreOffice to convert your RTF files to docx, then use a tool to merge the docx files.
There are such tools for .NET and Java (such as our MergeDocx product); I'm not sure what you'll find for PHP.

I succeed to build a reliable code, which make possible to manipulate the RTF files created with Microsoft Word. It works as long as you only need text, pictures and tabs, and don't need fancy things as color. Color works for text, but beside that ...
$content = "";
//stristr Returns all of haystack starting from and including the first occurrence of needle to the end.
$tmp_pard = stristr($RTFstring, "\pard");
//stristr fail to detect \trowd
$tmp_tab = stristr($RTFstring, "trowd");
if($tmp_pard != "" || $tmp_tab != "") {
//We pick the longer string. Because we want the first occurence of \pard or \trowd
if(strlen($tmp_pard) > strlen($tmp_tab))
// { is added so concatenation code still works. We just remove headers.
$content = "{" . substr($RTFstring,-strlen($tmp_pard)) ;
else
$content = "{" . "\\". substr($RTFstring,-strlen($tmp_tab)) ;
} else {
$content = $RTFstring;
}
return $content;

while exporting to the csv file time is showing based on 24

I have a php projec and here i need to export some data from database to the csv. My data contain "Time " and the value is "7.00 PM". But when i export this on cssv it's showing "19.00". But i want to show "7.00 PM"
Here is some portion of the Code
$out .= '"'.$val['first_name'].'",';
$out .= '"'.$val['second_name'].'",';
$out .= '"'.$val['pgm_time'].'",';
$f = fopen ('registered_users.csv','w');
// Put all values from $out to export.csv.
fputs($f, $out);
fclose($f);
header('Content-type: application/csv');
header('Content-Disposition: attachment; filename="registered_users.csv"');
readfile('registered_users.csv');
I have just output all these value, then it's shows correct time (7.00 Pm). but after exporting it shows 19.0 . Also my database contains teh correct value (7.00 Pm)
I have checked this in different system. In some system it's OK. But in some it's shows 19.00.. May be the version of the Microsoft or may be any inner setting of the editor. I don't know why this happen ?
I need a solution to show the time as based on 12 hrs ie 7.00 PM in all the system (independent of editor)
Does any one know .. or any other good code .Please help me ...
Thanks

Look in the right place
May be the version of the Microsoft or may be any inner setting of the editor
If you open your file in noteapad, or some other text editor - you're going to see the actual stored text, which will (probably) be "7.00 PM". If this is not the case please, please amend the question showing a reproducible bit of code to generate a csv file with "the problem".
If you open the file in Excel however, you're not just opening the file - it's being run through an import process. Excel will attempt to assign the right type to each column, in this case saying it's a date, and converting it appropriately. What you are describing is the user's own Excel settings for how to represent dates. This means your question has very little to do with php - and is all about how excel reads csv files.
Forcing "show as text" for csv files in excel
It's unclear why you want to do this (knowing why, may lead to a different suggestion), but if you prefix the field value with a single quote ' this will force excel to represent the cell value as text, rather than automatically determining the cell/column type:
$out .= '"\''.$val['pgm_time'].'",';
i.e. obtaining:
"foo","bar","'7.00 PM"
This will likely make the csv file problematic or useless to any other process that needs to read it

Why would you want to force that? That's one of those things that are supposed to be left to whatever is displaying the data. In fact, even if you were to output data in 12h format, opening the CSV in Excel or the like might just reformat it to 24h format anyway. It's best to leave time data in whatever format it comes in.

.aba file format not understanding it

Do you know about Australian Banker's Association (.aba) file format ? It is used for batch transactions which is quite similar to csv files. However, what I don't understand is, how is the columns separated from each other. For example, in csv files, we use like (,;) etc. Also I don't find a sample files. Here is one link that could help you help me fast if you don't know already.
http://www.cemtexaba.com/aba-format/cemtex-aba-file-format-details.html

However, what I don't understand is, how is the columns separated from each other. For example, in csv files, we use like (,;) etc
It is similar to CSV but it is a plan text file consisting of strings and lines...
Here are some ready solutions
Symfony 2 bundle -> https://github.com/latysh/aba-bundle
php library -> https://github.com/simonblee/aba-file-generator

It looks like a simple file format. Instead of thinking of it as a CSV file, which is delimited using some symbol, think of it as a string of characters.
So, if you have an ABA file then you can parse it using fopen() and fread().
<?php
$fh = fopen('example.aba', 'rb');
$block1 = fread($fh, 1);
$block2 = fread($fh, 17);
$block3 = fread($fh, 2);
$block4 = fread($fh, 3);
// And so on...
Of course it would make sense to also have some mechanism that validates the data, and makes sure that the file is not corrupted, but this is just a simple example.

I know this is a late reply, but for other people coding for ABA file format,
http://www.cemtexaba.com/aba-format/ has a sample file format.
And this site has a great explanation on each of the fields
https://github.com/mjec/aba/blob/master/sample-with-comments.aba but don't refer to it as your single source of truth.
As Sverri M. Olsen has mentioned, the columns are not specifically separated by a separator, instead they just stick to the length specified by the specification.

Getting the file name from a text file after string matching - PHP

I have a log file (log.txt) in the form:
=========================================
March 01 2050 13:05:00 log v.2.6
General Option: [default] log_options.xml
=========================================
Loaded options from xml file: '/the/path/of/log_options.xml'
printPDF started
PDF export
PDF file created:'/path/of/file.1.pdf'
postProcessingDocument started
INDD file removed:'/path/of/file.1.indd'
Error opening document: '/path/of/some/filesomething.indd':Error: file doesnt exist or no permissions
=========================================
March 01 2050 14:15:00 log v.2.6
General Option: [default] log_options.xml
=========================================
Loaded options from xml file: '/the/path/of/log_options.xml'
extendedprintPDF started
extendedprintPDF: Error: Unsaved documents have no full name: line xyz
Note: Each file name is of the format: 3lettersdatesomename_LO.pdf/indd. Example: MNM011112ThisFile_LO.pdf. Also, on a given day and time, the entry could either have just errors, just the message about the file created or both, like I have shown here.
The file continues this way. And, I have a db in the form:
id itemName status
1 file NULL
And so on...
Now, I am expected to go through the log file and for each file that is created or if there in an error, I am supposed to update the last column of DB with appropriate message: File created or Error. I thought of searching the string "PDF file created/Error" and then grabbing the file name.
I have tried various things like pathinfo() and strpos. But, I can't seem to understand how I am going to get it done.
Can someone please provide me some inputs on how I can solve this? The txt file and db are pretty huge.
NOTE: I provided the 2nd entry of the log file to be clear that the format in which errors appear IS NOT consistent. I would like to know if I can still achieve what I am supposed to with an inconsistent format for errors.
Can somebody please help after reading the whole question again? There have been plenty of changes from the first time I posted this.

You can use the explode method of php to break your file into pieces of words.
In case the fields in your text file are tab separated then you can explode on explode(String,'\t'); or else in case of space separated, explode on space.
Then a simple substr(word,start_index,length) on each word can give you the name of file (here start_index should be 0).
Using mysql_connect will help you connect to mysql database, or a much efficient way would be to use PDO (PHP Data Objects) to make your code much more reliable and flexible.
Another way out would be to use the preg_match method and specify a regular expression matching your error msg and parse for the file name.
You can refer to php.net manual for help any time.

Are all of the files PDFs? If so you can do a regex search on files with the .pdf extension. However, if the filename is also contained in the error string, you will need to exclude that somehow.
// Assume filenames contain only upper/lowercase letters, 0-9, underscores, periods, dashes, and forward slashes
preg_match_all('/([a-zA-Z0-9_\.-/]+\.pdf)/', $log_file_contents, $matches);
// $matches should be an array containing each filename.
// You can do array_unique() to exclude duplicates.
Edit: Keep in mind, $matches will be a multi-dimensional array as described http://php.net/manual/en/function.preg-match-all.php and http://php.net/manual/en/function.preg-match.php
To test a regex expression, you can use http://regexpal.com/

Okay, so the main issue here is that you either don't have a consistent delimiter for "entries"..or else you are not providing enough info. So based on what you have provided, here is my suggestion. The main caveat here is that without a solid delimiter for "entries," there's no way to know for sure if the error matches up with the file name. The only way to fix this is to format your file better. Also you have to fill in some blanks, like your db info and how you actually perform the query.
$handle = fopen("log.txt", "rb");
while (!feof($handle)) {
// get the current row
$row = fread($handle, 8192);
// get file names
preg_match('~^PDF file created:(.*?)$~',$row,$match);
if ( isset($match[1]) ) {
$files[] = $match[1];
}
// get errors
preg_match('~^Error:(.*?)$~',$row,$match);
if ( isset($match[1]) ) {
$errors[] = $match[1];
}
}
fclose($handle);
// connect to db
foreach ($files as $k => $file) {
// assumes your table just has basename of file
$file = basename($file);
$error = ( isset($errors[$k]) ) ? $errors[$k] : null;
$sql = "update tablename set status='$error' where itemName='$file'";
// execute query
}
EDIT: Actually going back to your post, it looks like you want to update a table not insert, so you will want to change the query to be an update. And you may need to further work with $file in that foreach for your where clause, depending on how you store your filenames in your db (for example, if you just store the basename, you will likely want to do $file = basename($file); in the foreach). Code updated to reflect this.
So hopefully this will point you in the right direction.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.