This must be relatively easy, but I'm struggling to find a solution. I receive data using proprietary network protocol with encryption and at the end the entire received content ends up in a variable. The content is actually that of a CSV file - and I need to parse this data.
If this were a regular file on disk, I could use fgetcsv; if I could somehow break the content into individual records, I could use str_getcsv - but how can I break this file into records? Simple reading until a newline will not work, because CSV can contain values with line breaks in them. Below is an example set of data:
ID,SLN,Name,Address,Contract no
123,102,Market 1a,"Main street, Watertown, MA, 02471",16
125,97,Sinthetics,"Another address,
Line 2
City, NY 10001",16
167,105,"Progress, ahead",,18
All of this data is held inside one variable - and I need to parse it.
Of course, I can always write this data into a temporary file on disk the read/parse it using fgetcsv, but it seems extremely inefficient to me.
If fgetcsv works for you, consider this:
file_put_contents("php://temp",$your_data_here);
$stream = fopen("php://temp","r");
// $result = fgetcsv($stream); ...
For more on php://temp, see the php:// wrapper
Related
I have tried to extract the user email addresses from my server. But the problem is maximum files are .txt but some are CSV files with txt extension. When I am trying to read and extract, I could not able to read the CSV files which with TXT extension. Here is my code:
<?php
$handle = fopen('2.txt', "r");
while(!feof($handle)) {
$string = fgets($handle);
$pattern = '/[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}/i';
preg_match_all($pattern, $string, $matches);
foreach($matches[0] as $match)
{
echo $match;
echo '<br><br>';
}
}
?>
I have tried to use this code for that. The program is reading the complete file which are CSV, and line by line which are Text file. There are thousands of file and hence it is difficult to identify.
Kindly, suggest me what I should do to resolve my problem? Is there any solution which can read any format, then it will be awesome.
Well your files are different. Because of that you will have to take a different approach for each of those. In more general terms this is usually calling adapting and is mostly provided using the Adapter design pattern.
Should you use the adapter design pattern you would have a code inspecting the extension of a file to be opened and a switch with either txt or csv. Based on the value you would retrieve aTxtParseror aCsvParser` respectively.
However, before diving deep into this territory you might want to have a look at the files first. I cannot say this for sure without seeing the structures but you can. If the contents of both the text and csv files are the same then a very simple approach is to change the extension to either txt or a csv for all files and then process them using same logic, knowing files with the same extension will now be processed in the same manner.
But from what I understood the file structures actually differ. So to keep your code concise the adapter pattern, having two separate classes/functions for parsing and another one on top of that for choosing the right parsing function (this top function would actually be a form of a strategy) and running it.
Either way, I very much doubt so there is a solution for the problem you are facing as a file structure is mostly your and your own.
Ok, so problem is when CSV file has too long string line. Based on this restriction I suggest you to use example from php.net Here is an example:
$handle = #fopen("/tmp/inputfile.txt", "r");
if ($handle) {
while (($buffer = fgets($handle, 4096)) !== false) {
echo $buffer;
// do your operation for searching here
}
if (!feof($handle)) {
echo "Error: unexpected fgets() fail\n";
}
fclose($handle);
}
I have an XML data source URL from where I am reading the data using fread. It contains student information from which I am extracting the Grades and compiling them in an array.
The problem is when I run this script locally, it works fine and all the grades are correctly listed/collected in array. However, when I run this script on a shared server, I get some incorrectly read grades in addition the normal grade names, for example, "ergarten". The complete grade name "Kindergarten" is also recorded in the array which means that there a problem in reading only specific elements.
The first suspect I have in mind is fread byte length. I have changed it to 8192 but without luck.
Here is the relevant code chunk from the php file:
if (!($xml_parser = xml_parser_create())) die("Couldn't create parser.");
xml_set_element_handler( $xml_parser, "startElementHandler", "endElementHandler");
xml_set_character_data_handler( $xml_parser, "characterDataHandler");
while( $data = fread($fp, 8192)){
if(!xml_parse($xml_parser, $data, feof($fp))) {
break;}}
xml_parser_free($xml_parser);
Any thoughts?
I found the problem and fixed it myself.
The problem was that in the loop where the data was being read in chunks using fread, I was simultaneously converting that data using the XML parser and that was causing the problem since the streams of data do not always have a full tags. I removed the parser from that point to run it only when all the data has been read by the script.
That solved the problem.
I'm looking to read contents of a file between two tags in a large text file (so can't read the whole file at once due to memory restrictions on my server provider). This file has around 500000 lines of text.
This ( PHP: Read Specific Line From File ) isn't an option (I don't think), as the text I need to read varies in length and will take up multiple lines (varies from 20-5000 lines).
I am planning to use fopen, fread (read only) and fclose to read the file contents. I have experience of using these functions already.
I am looking to read all the contents in a selected part of the file. i.e.
File contents example
<<TAGNAME-1>>AAAA AAAA AAAA<<//TAGNAME-1>>
<<TAGNAME-2>>TEXT TEXT TEXT<<//TAGNAME-2>>
To select the text "AAAA AAAA AAAA" between the <<TAGNAME-1>> and <<//TAGNAME-1>> when TAGNAME-1 is called as a variable in my script.
How could I go about selecting all the text between the two tags that I require? (and ignore the remainder of the file) I have the ability to create the two tags where required in my php script - my issue is implementing this within the fread function.
You could grep the text file which would only return the text with a matching tag.
$tagnum = 2; //variable
$pattern = "<<TAGNAME-";
$searchstr = $pattern.$tagnum; //concat the prefix with the tag number
$fpath ="testtext.txt"; //define path to text file
$result = exec('grep -in "'.$searchstr.'" '.$fpath);
echo $result;
Where $tagnum would define each tag to search. I've tested it in my sandbox and it works as expected. Note this will read the whole line until the end tad or newline is reached.
Regards,
Hey guys I've seen a lot of options on fread (which requires a fiole, or writing to memory),
but I am trying to invalidate an input based on a string that has already been accepted (unknown format). I have something like this
if (FALSE !== str_getcsv($this->_contents, "\n"))
{
foreach (preg_split("/\n/", $this->_contents) AS $line)
{
$data[] = explode(',', $line);
}
print_r($data); die;
$this->_format = 'csv';
$this->_contents = $this->trimContents($data);
return true;
}
Which works fine on a real csv or csv filled variable, but when I try to pass it garbage to invalidate, something like:
https://www.gravatar.com/avatar/625a713bbbbdac8bea64bb8c2a9be0a4 which is garbage (since its a png), it believes its csv
anyway and keeps on chugging along until the program chokes. How can I fix this? I have not seen and CSV validators that
are not at least several classes deep, is there a simple three or four line to (in)validate?
is there a simple three or four line to (in)validate?
Nope. CSV is so loosely defined - it has no telltale signs like header bytes, and there isn't even a standard for what character is used for separating columns! - that there technically is no way to tell whether a file is CSV or not - even your PNG could technically be a gigantic one-column CSV with some esoteric field and line separator.
For validation, look at what purpose you are using the CSV files for and what input you are expecting. Are the files going to contain address data, separated into, say, 10 columns? Then look at the first line of the file, and see whether enough columns exist, and whether they contain alphanumeric data. Are you looking for a CSV file full of numbers? Then parse the first line, and look for the kinds of values you need. And so on...
If you have an idea of the kinds of CSVs likely to make it to your system, you could apply some heuristics -- at the risk of not accepting valid CSVs. For instance, you could look at line length, consistency of line length, special characters, etc...
If all you are doing is checking for the presence of commas and newlines, then any sufficiently large, random file will likely have those and thus pass such a CSV test.
Do you know about Australian Banker's Association (.aba) file format ? It is used for batch transactions which is quite similar to csv files. However, what I don't understand is, how is the columns separated from each other. For example, in csv files, we use like (,;) etc. Also I don't find a sample files. Here is one link that could help you help me fast if you don't know already.
http://www.cemtexaba.com/aba-format/cemtex-aba-file-format-details.html
However, what I don't understand is, how is the columns separated from each other. For example, in csv files, we use like (,;) etc
It is similar to CSV but it is a plan text file consisting of strings and lines...
Here are some ready solutions
Symfony 2 bundle -> https://github.com/latysh/aba-bundle
php library -> https://github.com/simonblee/aba-file-generator
It looks like a simple file format. Instead of thinking of it as a CSV file, which is delimited using some symbol, think of it as a string of characters.
So, if you have an ABA file then you can parse it using fopen() and fread().
<?php
$fh = fopen('example.aba', 'rb');
$block1 = fread($fh, 1);
$block2 = fread($fh, 17);
$block3 = fread($fh, 2);
$block4 = fread($fh, 3);
// And so on...
Of course it would make sense to also have some mechanism that validates the data, and makes sure that the file is not corrupted, but this is just a simple example.
I know this is a late reply, but for other people coding for ABA file format,
http://www.cemtexaba.com/aba-format/ has a sample file format.
And this site has a great explanation on each of the fields
https://github.com/mjec/aba/blob/master/sample-with-comments.aba but don't refer to it as your single source of truth.
As Sverri M. Olsen has mentioned, the columns are not specifically separated by a separator, instead they just stick to the length specified by the specification.