Do you know about Australian Banker's Association (.aba) file format ? It is used for batch transactions which is quite similar to csv files. However, what I don't understand is, how is the columns separated from each other. For example, in csv files, we use like (,;) etc. Also I don't find a sample files. Here is one link that could help you help me fast if you don't know already.
http://www.cemtexaba.com/aba-format/cemtex-aba-file-format-details.html
However, what I don't understand is, how is the columns separated from each other. For example, in csv files, we use like (,;) etc
It is similar to CSV but it is a plan text file consisting of strings and lines...
Here are some ready solutions
Symfony 2 bundle -> https://github.com/latysh/aba-bundle
php library -> https://github.com/simonblee/aba-file-generator
It looks like a simple file format. Instead of thinking of it as a CSV file, which is delimited using some symbol, think of it as a string of characters.
So, if you have an ABA file then you can parse it using fopen() and fread().
<?php
$fh = fopen('example.aba', 'rb');
$block1 = fread($fh, 1);
$block2 = fread($fh, 17);
$block3 = fread($fh, 2);
$block4 = fread($fh, 3);
// And so on...
Of course it would make sense to also have some mechanism that validates the data, and makes sure that the file is not corrupted, but this is just a simple example.
I know this is a late reply, but for other people coding for ABA file format,
http://www.cemtexaba.com/aba-format/ has a sample file format.
And this site has a great explanation on each of the fields
https://github.com/mjec/aba/blob/master/sample-with-comments.aba but don't refer to it as your single source of truth.
As Sverri M. Olsen has mentioned, the columns are not specifically separated by a separator, instead they just stick to the length specified by the specification.
Related
I have tried to extract the user email addresses from my server. But the problem is maximum files are .txt but some are CSV files with txt extension. When I am trying to read and extract, I could not able to read the CSV files which with TXT extension. Here is my code:
<?php
$handle = fopen('2.txt', "r");
while(!feof($handle)) {
$string = fgets($handle);
$pattern = '/[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}/i';
preg_match_all($pattern, $string, $matches);
foreach($matches[0] as $match)
{
echo $match;
echo '<br><br>';
}
}
?>
I have tried to use this code for that. The program is reading the complete file which are CSV, and line by line which are Text file. There are thousands of file and hence it is difficult to identify.
Kindly, suggest me what I should do to resolve my problem? Is there any solution which can read any format, then it will be awesome.
Well your files are different. Because of that you will have to take a different approach for each of those. In more general terms this is usually calling adapting and is mostly provided using the Adapter design pattern.
Should you use the adapter design pattern you would have a code inspecting the extension of a file to be opened and a switch with either txt or csv. Based on the value you would retrieve aTxtParseror aCsvParser` respectively.
However, before diving deep into this territory you might want to have a look at the files first. I cannot say this for sure without seeing the structures but you can. If the contents of both the text and csv files are the same then a very simple approach is to change the extension to either txt or a csv for all files and then process them using same logic, knowing files with the same extension will now be processed in the same manner.
But from what I understood the file structures actually differ. So to keep your code concise the adapter pattern, having two separate classes/functions for parsing and another one on top of that for choosing the right parsing function (this top function would actually be a form of a strategy) and running it.
Either way, I very much doubt so there is a solution for the problem you are facing as a file structure is mostly your and your own.
Ok, so problem is when CSV file has too long string line. Based on this restriction I suggest you to use example from php.net Here is an example:
$handle = #fopen("/tmp/inputfile.txt", "r");
if ($handle) {
while (($buffer = fgets($handle, 4096)) !== false) {
echo $buffer;
// do your operation for searching here
}
if (!feof($handle)) {
echo "Error: unexpected fgets() fail\n";
}
fclose($handle);
}
This must be relatively easy, but I'm struggling to find a solution. I receive data using proprietary network protocol with encryption and at the end the entire received content ends up in a variable. The content is actually that of a CSV file - and I need to parse this data.
If this were a regular file on disk, I could use fgetcsv; if I could somehow break the content into individual records, I could use str_getcsv - but how can I break this file into records? Simple reading until a newline will not work, because CSV can contain values with line breaks in them. Below is an example set of data:
ID,SLN,Name,Address,Contract no
123,102,Market 1a,"Main street, Watertown, MA, 02471",16
125,97,Sinthetics,"Another address,
Line 2
City, NY 10001",16
167,105,"Progress, ahead",,18
All of this data is held inside one variable - and I need to parse it.
Of course, I can always write this data into a temporary file on disk the read/parse it using fgetcsv, but it seems extremely inefficient to me.
If fgetcsv works for you, consider this:
file_put_contents("php://temp",$your_data_here);
$stream = fopen("php://temp","r");
// $result = fgetcsv($stream); ...
For more on php://temp, see the php:// wrapper
Hey guys I've seen a lot of options on fread (which requires a fiole, or writing to memory),
but I am trying to invalidate an input based on a string that has already been accepted (unknown format). I have something like this
if (FALSE !== str_getcsv($this->_contents, "\n"))
{
foreach (preg_split("/\n/", $this->_contents) AS $line)
{
$data[] = explode(',', $line);
}
print_r($data); die;
$this->_format = 'csv';
$this->_contents = $this->trimContents($data);
return true;
}
Which works fine on a real csv or csv filled variable, but when I try to pass it garbage to invalidate, something like:
https://www.gravatar.com/avatar/625a713bbbbdac8bea64bb8c2a9be0a4 which is garbage (since its a png), it believes its csv
anyway and keeps on chugging along until the program chokes. How can I fix this? I have not seen and CSV validators that
are not at least several classes deep, is there a simple three or four line to (in)validate?
is there a simple three or four line to (in)validate?
Nope. CSV is so loosely defined - it has no telltale signs like header bytes, and there isn't even a standard for what character is used for separating columns! - that there technically is no way to tell whether a file is CSV or not - even your PNG could technically be a gigantic one-column CSV with some esoteric field and line separator.
For validation, look at what purpose you are using the CSV files for and what input you are expecting. Are the files going to contain address data, separated into, say, 10 columns? Then look at the first line of the file, and see whether enough columns exist, and whether they contain alphanumeric data. Are you looking for a CSV file full of numbers? Then parse the first line, and look for the kinds of values you need. And so on...
If you have an idea of the kinds of CSVs likely to make it to your system, you could apply some heuristics -- at the risk of not accepting valid CSVs. For instance, you could look at line length, consistency of line length, special characters, etc...
If all you are doing is checking for the presence of commas and newlines, then any sufficiently large, random file will likely have those and thus pass such a CSV test.
I am building a tool that will accept a CSV or tab-delimited file, which will then be parsed and the data databased.
The uploaded file can be CSV or tab-delimited.
I came up with a workable solution (below) for detecting what format the file might be in and would like to know if there is a better way to solve this and/or how any of you out there have solved the same problem.
Thanks
<?php
$csv_comma='Fruit,Color
Apple,"Red,Green"
Tomato,"Red,Green"
Banana,Yellow
Tangerine,Orange
';
$csv_semi_colon='Fruit;Color
Apple;"Red,Green"
Tomato;"Red,Green"
Banana;Yellow
Tangerine;Orange
';
$tab_delimited='Fruit Color
Apple Red,Green
Tomato Red,Green
Banana Yellow
Tangerine Orange';
$fileArr = array($csv_comma,$csv_semi_colon,$tab_delimited);
foreach($fileArr as $file){
if(preg_match('/^(.+),(.+)/',trim($file))){
echo "CSV with comma separator";
}
if(preg_match('/^(.+);(.+)/',trim($file))){
echo "CSV with semi colon separator";
}
if(preg_match('/^(.+)\t(.+)/',trim($file))){
echo "Tab delimited";
}
}
Well csv has this prety much implemented.
Default for csv is , but with sep= you could specify an other seperator.
You could just implement this as csv. So you have a default of , but if the sep is defined you use that.
You file could look like:
apple, orange, tomato
or
sep=;
apple; orange; tomato
So if the first line starts with sep, it is an "option" line otherwise there are values. For tab you do sep=\t
Now users can define there own seperator and no guessing any more
After some comments of CBroe of easy to use for the user there could be some changes. csv only accepts one charachter as septerator so that system could be use like the above. cvs editor (like excel) will handle that for the user
If the user uses the tab it won't be a csv file but a .txt (for example). So you could change the default according to the file given.
Also I want to add, already pointed out in the comments, if you want to guess you will hit a point where it will occure it is wrong.
I don't know the setup of the files but csv lines need to be the same length (according to my memory). So what you could do is read out the first x lines. and use every seperator.
After that you check which lines lengths are the same, most likely that is your seperator (again guessing)
You can use this kind of pattern to check the csv structure and determine the separator:
if (preg_match('^(?:("[^"]++"|[^,;\t\n]++)(?<sep>[,\t;])(?1)(?:\n|$))++$', $csv_comma, $match))
print_r($match['sep']);
"AMZN","Amazon.com, Inc.",211.22,"11/9/2011","4:00pm","-6.77 - -3.11%",4673052
Amazon.com, Inc. is being treated as 2 values instead of one.
I tried this $data = explode( ',', $s);
How do I modify this to avoid the comma in the value issue?
You should probably look into str_getcsv() (or fgetcsv() if you're loading the CSV from a file)
This will read the CSV contents into an array without the need for exploding etc.
Edit- to expand upon the point made by Pekka, if you're using PHP < 5.3 str_getcsv() won't work but there's an interesting approach here which reproduces the functionality for lesser versions. And another approach here which uses fgetcsv() after creating a temporary file.
Use a dedicated CSV library. It's been explained over and over that parsing file formats like CSV manually is asking for trouble, because you don't know all the variations of CSV and all the rules to do it right.