binary safe write on file with php to create a DBF file - php

I need to split a big DBF file using php functions, this means that i have for example 1000 records, i have to create 2 files with 500 records each.
I do not have any dbase extension available nor i can install it so i have to work with basic php functions. Using basic fread function i'm able to correctly read and parse the file, but when i try to write a new dbf i have some problems.
As i have understood, the DBF file is structured in a 2 line file: the first line contains file info, header info and it's in binary. The second line contains the data and it's plain text. So i thought to simply write a new binary file replicating the first line and manually adding the first records in the first file, the other records in the other file.
That's the code i use to parse the file and it works nicely
$fdbf = fopen($_FILES['userfile']['tmp_name'],'r');
$fields = array();
$buf = fread($fdbf,32);
$header=unpack( "VRecordCount/vFirstRecord/vRecordLength", substr($buf,4,8));
$goon = true;
$unpackString='';
while ($goon && !feof($fdbf)) { // read fields:
$buf = fread($fdbf,32);
if (substr($buf,0,1)==chr(13)) {$goon=false;} // end of field list
else {
$field=unpack( "a11fieldname/A1fieldtype/Voffset/Cfieldlen/Cfielddec", substr($buf,0,18));
$unpackString.="A$field[fieldlen]$field[fieldname]/";
array_push($fields, $field);
}
}
fseek($fdbf, 0);
$first_line = fread($fdbf, $header['FirstRecord']+1);
fseek($fdbf, $header['FirstRecord']+1); // move back to the start of the first record (after the field definitions)
first_line is the variable the contains the header data, but when i try to write it in a new file something wrong happens and the row isn't written exactly as it was read. That's the code i use for writing:
$handle_log = fopen($new_filename, "wb");
fwrite($handle_log, $first_line, strlen($first_line) );
fwrite($handle_log, $string );
fclose($handle_log);
I've tried to add the b value to fopen mode parameter as suggested to open it in a binary way, i've also taken a suggestion to add exactly the length of the string to avoid the stripes of some characters but unsuccessfully since all the files written are not correctly in DBF format. What can i do to achieve my goal?

As i have understood, the DBF file is structured in a 2 line file: the
first line contains file info, header info and it's in binary. The
second line contains the data and it's plain text.
Well, it's a bit more complicated than that.
See here for a full description of the dbf file format.
So it would be best if you could use a library to read and write the dbf files.
If you really need to do this yourself, here are the most important parts:
Dbf is a binary file format, so you have to read and write it as binary. For example the number of records is stored in a 32 bit integer, which can contain zero bytes.
You can't use string functions on that binary data. For example strlen() will scan the data up to the first null byte, which is present in that 32 bit integer, and will return the wrong value.
If you split the file (the records), you'll have to adjust the record count in the header.
When splitting the records keep in mind that each record is preceded by an extra byte, a space 0x20 if the record is not deleted, an asterisk 0x2A if the record is deleted. (for example, if you have 4 fields of 10 bytes, the length of each record will be 41) - that value is also available in the header: bytes 10-11 - 16-bit number - Number of bytes in the record. (Least significant byte first)
The file could end with the end-of-file marker 0x1A, so you'll have to check for that as well.

Related

Force string data type when writing to CSV file

In PHP, is there a way to force the value "00123" to be inserted into a CSV file as a string?
This way, when you open the CSV file the value will remain 00123 rather than removing the leading zeros and showing 123.
The primary reason I'd like achieve this is for a list of zipcodes, whereas there are multiple zipcodes that have leading zeros and I'd like the values to reflect that.
<?php
if( $fh = fopen('filename.csv','w') ){
$line = ['00123'];
fputcsv($fh,$line);
fclose($fh);
}
CSV does not have types. Values written using the ,"..", syntax merely delimit the value to disambiguate the usage of , within the value itself; it does not mean that the value is "a string".
I suspect your values are mangled when imported into Excel or such. There's no solution to this that CSV can offer; you can only import the file using the import wizard and specify that the column should be used as is and not cast to a number. (This may or may not actually work depending on what effed-up version of Excel you're using.)
If you don't want to go through this every time, you should be producing an XLSX file, which does have types.
I guess there is no way to do it because "CSV" files are just "Comma-Separated Values"
You have to use the editor options for csv import.

How to extract data from CSV with PHP

I'm using the Sebastian Bergmann PHPUnit selenium webdriver.
Current I have:
$csv = file_get_contents('functions.csv', NULL,NULL,1);
var_dump($csv);
// select 1 random line here
This will load my CSV file and give me all possible data from the file.
It has multiple rows for example:
Xenoloog-FUNC/8_4980
Xylofonist-FUNC/8_4981
IJscoman-FUNC/8_4982
Question: How can I get that data randomly?
I just want to use 1 ( random) line of data.
Would it be easier to just grab 1 (random) line of the file instead of everything?
Split the string into an array, then grab a random index from that array:
$lines = explode("\n", $csv);
$item = $lines[array_rand($lines)];
You could use the offset and maxlen parameters to grab part of the file using file_get_contents. You could also use fseek after fopen to select part of a file. These functions both take numbers of bytes as arguments. This post has a little more information:
Get part of a file by byte number in php
It may require some hacking to translate a particular row-index of a CSV file to a byte offset. You might need to generate and load a small meta-data file which contains a list of bytes-occupancies for each row of of CSV data. That would probably help.

Read SELECTED contents from a large text file (varying length text)

I'm looking to read contents of a file between two tags in a large text file (so can't read the whole file at once due to memory restrictions on my server provider). This file has around 500000 lines of text.
This ( PHP: Read Specific Line From File ) isn't an option (I don't think), as the text I need to read varies in length and will take up multiple lines (varies from 20-5000 lines).
I am planning to use fopen, fread (read only) and fclose to read the file contents. I have experience of using these functions already.
I am looking to read all the contents in a selected part of the file. i.e.
File contents example
<<TAGNAME-1>>AAAA AAAA AAAA<<//TAGNAME-1>>
<<TAGNAME-2>>TEXT TEXT TEXT<<//TAGNAME-2>>
To select the text "AAAA AAAA AAAA" between the <<TAGNAME-1>> and <<//TAGNAME-1>> when TAGNAME-1 is called as a variable in my script.
How could I go about selecting all the text between the two tags that I require? (and ignore the remainder of the file) I have the ability to create the two tags where required in my php script - my issue is implementing this within the fread function.
You could grep the text file which would only return the text with a matching tag.
$tagnum = 2; //variable
$pattern = "<<TAGNAME-";
$searchstr = $pattern.$tagnum; //concat the prefix with the tag number
$fpath ="testtext.txt"; //define path to text file
$result = exec('grep -in "'.$searchstr.'" '.$fpath);
echo $result;
Where $tagnum would define each tag to search. I've tested it in my sandbox and it works as expected. Note this will read the whole line until the end tad or newline is reached.
Regards,

PHP Overwriting at a specific location in a file

I am trying to write a text at a specific position in a file that already has some content. After writing I find the file truncated to the size of the text plus fseek position and the first characters with value 0. Is this the normal behaviour or am I missing something? I want to mention that I'm trying to avoid loading the file into memory and writing it back.
$file = fopen("text.txt","w");
fseek($file,3);
fwrite($file,"Hello");
fclose($file);
You need to open the file in c mode, else it's truncated on fopen:
$file = fopen("text.txt","c");
See http://php.net/manual/de/function.fopen.php for a documentation of all file open modes and what exactly they do. Also see the http://www.php.net/manual/en/function.fseek.php manual
Yes this is normal behaviour :
fopen($file, "w"):
place the file pointer at the beginning of the file and truncate the file to zero length.
fseek():
In general, it is allowed to seek past the end-of-file; if data is then written, reads in any unwritten region between the end-of-file and the sought position will yield bytes with value 0. [..]
If you have opened the file in append (a or a+) mode, any data you write to the file will always be appended, regardless of the file position, and the result of calling fseek() will be undefined.
You probably want to open the file in a non truncating write mode (e.g. "c" but not "a").

PHP invalidating a CSV file

Hey guys I've seen a lot of options on fread (which requires a fiole, or writing to memory),
but I am trying to invalidate an input based on a string that has already been accepted (unknown format). I have something like this
if (FALSE !== str_getcsv($this->_contents, "\n"))
{
foreach (preg_split("/\n/", $this->_contents) AS $line)
{
$data[] = explode(',', $line);
}
print_r($data); die;
$this->_format = 'csv';
$this->_contents = $this->trimContents($data);
return true;
}
Which works fine on a real csv or csv filled variable, but when I try to pass it garbage to invalidate, something like:
https://www.gravatar.com/avatar/625a713bbbbdac8bea64bb8c2a9be0a4 which is garbage (since its a png), it believes its csv
anyway and keeps on chugging along until the program chokes. How can I fix this? I have not seen and CSV validators that
are not at least several classes deep, is there a simple three or four line to (in)validate?
is there a simple three or four line to (in)validate?
Nope. CSV is so loosely defined - it has no telltale signs like header bytes, and there isn't even a standard for what character is used for separating columns! - that there technically is no way to tell whether a file is CSV or not - even your PNG could technically be a gigantic one-column CSV with some esoteric field and line separator.
For validation, look at what purpose you are using the CSV files for and what input you are expecting. Are the files going to contain address data, separated into, say, 10 columns? Then look at the first line of the file, and see whether enough columns exist, and whether they contain alphanumeric data. Are you looking for a CSV file full of numbers? Then parse the first line, and look for the kinds of values you need. And so on...
If you have an idea of the kinds of CSVs likely to make it to your system, you could apply some heuristics -- at the risk of not accepting valid CSVs. For instance, you could look at line length, consistency of line length, special characters, etc...
If all you are doing is checking for the presence of commas and newlines, then any sufficiently large, random file will likely have those and thus pass such a CSV test.

Categories