Import CSV file to Postgre

Import CSV file to Postgre - php

I'm writing some PHP code to import a CSV file to a Postgre DB, and I'm getting the error below. Can you help me?
Warning: pg_end_copy(): Query failed: ERROR: literal newline found in data HINT: Use "\n" to represent newline. CONTEXT: COPY t_translation, line 2 in C:\xampp\htdocs\importing_csv\importcsv.php on line 21
<?php
$connString = 'host = localhost dbname= importdb user=postgres password=pgsql';
$db = pg_connect($connString);
$file = file('translation.csv');
//pg_exec($db, "CREATE TABLE t_translation (id numeric, identifier char(100), device char(10), page char(40), english char(100), date_created char(30), date_modified char(30), created_by char(30), modified_by char(30) )");
pg_exec($db, "COPY t_translation FROM stdin");
foreach ($file as $line) {
$tmp = explode(",", $line);
pg_put_line($db, sprintf("%d\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n", $tmp[0], $tmp[1], $tmp[2], $tmp[3], $tmp[4], $tmp[5], $tmp[6], $tmp[7], $tmp[8]));
}
pg_put_line($db, "\\.\n");
pg_end_copy($db);
?>

You need to specify FILE_IGNORE_NEW_LINES flag in file() function as a 2nd parameter which otherwise by default will include the newline char at the end of the each array item. This is likely whats causing the issue here.
So just add this flag FILE_IGNORE_NEW_LINES so that lines extracted from csv file will not have newline char at the end of the each line:
$file = file('translation.csv', FILE_IGNORE_NEW_LINES);
Also I would recommend using fgetcsv() instead to read csv file.

If you're willing to use PDO (necessitates a separate connection call), there's an elegant solution that does not require as much processing of the data by PHP, and that will also work with any combination of fields so long as their names in the CSV header match the names in the database. I'll assume you already have initialized PDO and have the object as $pdo, and the filename is $filename. Then:
$file=fopen($filename,'r');
$lines=explode("\n", fread ($file, filesize($filename)));
if (end($lines)=='') array_pop($lines); // Remove the last line if it empty, as often happens, so it doesn't generate an error with postgres
$fields=array_shift($lines); // Retrieve & remove the field list
$null_as="\\\\N"; // Or whatever your notation for NULL is, if needed
$result=$pdo->pgsqlCopyFromArray('t_translation',$lines,',',$null_as,$fields);
This is pretty minimal, there is no error handling other than $result returning success or failure, but it can be a starting point.
I like this solution better than the approach you are taking though because you don't need to specify the fields at all, it's all handled automatically.
If you don't want to use PDO, there's a similar solution using your setup and syntax, just for the last line replace it with:
pg_copy_from($db,'t_translation',$lines,',',$null_as)
This solution, however, does not dynamically adjust the field names, the fields of the CSV need to exactly match those in the table. However, the names don't need to line up, as the first line of the CSV file is ignored. I haven't tested this last line though because I don't use this type of connection, so there could be an error in it.

Related

binary safe write on file with php to create a DBF file

I need to split a big DBF file using php functions, this means that i have for example 1000 records, i have to create 2 files with 500 records each.
I do not have any dbase extension available nor i can install it so i have to work with basic php functions. Using basic fread function i'm able to correctly read and parse the file, but when i try to write a new dbf i have some problems.
As i have understood, the DBF file is structured in a 2 line file: the first line contains file info, header info and it's in binary. The second line contains the data and it's plain text. So i thought to simply write a new binary file replicating the first line and manually adding the first records in the first file, the other records in the other file.
That's the code i use to parse the file and it works nicely
$fdbf = fopen($_FILES['userfile']['tmp_name'],'r');
$fields = array();
$buf = fread($fdbf,32);
$header=unpack( "VRecordCount/vFirstRecord/vRecordLength", substr($buf,4,8));
$goon = true;
$unpackString='';
while ($goon && !feof($fdbf)) { // read fields:
$buf = fread($fdbf,32);
if (substr($buf,0,1)==chr(13)) {$goon=false;} // end of field list
else {
$field=unpack( "a11fieldname/A1fieldtype/Voffset/Cfieldlen/Cfielddec", substr($buf,0,18));
$unpackString.="A$field[fieldlen]$field[fieldname]/";
array_push($fields, $field);
}
}
fseek($fdbf, 0);
$first_line = fread($fdbf, $header['FirstRecord']+1);
fseek($fdbf, $header['FirstRecord']+1); // move back to the start of the first record (after the field definitions)
first_line is the variable the contains the header data, but when i try to write it in a new file something wrong happens and the row isn't written exactly as it was read. That's the code i use for writing:
$handle_log = fopen($new_filename, "wb");
fwrite($handle_log, $first_line, strlen($first_line) );
fwrite($handle_log, $string );
fclose($handle_log);
I've tried to add the b value to fopen mode parameter as suggested to open it in a binary way, i've also taken a suggestion to add exactly the length of the string to avoid the stripes of some characters but unsuccessfully since all the files written are not correctly in DBF format. What can i do to achieve my goal?

As i have understood, the DBF file is structured in a 2 line file: the
first line contains file info, header info and it's in binary. The
second line contains the data and it's plain text.
Well, it's a bit more complicated than that.
See here for a full description of the dbf file format.
So it would be best if you could use a library to read and write the dbf files.
If you really need to do this yourself, here are the most important parts:
Dbf is a binary file format, so you have to read and write it as binary. For example the number of records is stored in a 32 bit integer, which can contain zero bytes.
You can't use string functions on that binary data. For example strlen() will scan the data up to the first null byte, which is present in that 32 bit integer, and will return the wrong value.
If you split the file (the records), you'll have to adjust the record count in the header.
When splitting the records keep in mind that each record is preceded by an extra byte, a space 0x20 if the record is not deleted, an asterisk 0x2A if the record is deleted. (for example, if you have 4 fields of 10 bytes, the length of each record will be 41) - that value is also available in the header: bytes 10-11 - 16-bit number - Number of bytes in the record. (Least significant byte first)
The file could end with the end-of-file marker 0x1A, so you'll have to check for that as well.

removing a complete column and data from csv

I have exported my sql dump in csv format, so suppose my schema was like name,email ,country, I want to remove email column and all its data from csv. what would be the most optimized way to do that either using a tool or any technique.I tried to load that dump in excel but that didn't looked proper
Thanks

you could copy the table inside the mysql database, delete the email column using some mysql client and export back to csv.

Importing to excel should work with ordered data - you might need to consider alternative delimiters if your data contains commas (such as addresses). If possible use an alternative delimiter, add quote marks around troublesome fields or shift to fixed width output.
Any tool you write or use will need to be able to parse your data and that will always be an issue if the delimiter is scattered through the data.
Alternatively rewrite the view / select / procedure that is generating the data set initially.

This command should do it (assuming a unix* OS):
$ cut -d ',' -f 1,3- dump.csv > newdump.csv
UPDATE: DevZer0 is right, this is unfit for the general case. So you could do (it's tested):
#!/usr/bin/env perl
use Text::ParseWords;
my $file = 'dump.csv';
my $indexOfFieldToBeRemoved = 1; # 0, 1, ...
my #data;
open(my $fh, '<', $file) or die "Can't read file '$file' [$!]\n";
while (my $line = <$fh>) {
chomp $line;
my #fields = Text::ParseWords::parse_line(',', 0, $line);
splice(#fields, $indexOfFieldToBeRemoved, 1);
foreach (#fields) {
print "\"$_\",";
}
print "\n";
}
close $fh;
Sorry, nothing simpler (if you can't re-generate csv dump, as suggested)...

PHP fopen("filename", w) creates files with ~1

I am trying to write a file in PHP. So far it works "kind of".
I have an array of names in the format {Rob, Kevin, Michael}. I use the line of code
foreach($Team as $user)
{
print_r($user);
//create a file for each user
$file = fopen("./employee_lists/".$user, 'w');
//I have also tried: fopen("employee_lists/$user", 'w');
// ... ... ...
//write some data to each file.
}
This works as expected: The print_r shows "Rob Kevin Michael", however, the filenames are saved are as follows: ROB~1, KEVIN~1, MICHAE~1
When I'm going on to use these files later in my code, and I want to relate the usernames of "Rob" to ROB~1, I'll have to take some extra step to do this. I feel like I'm using fopen incorrectly, but it does exactly what I want minus this little naming scheme issue.

It seems like your $user variable contains an invalid character for file system paths (my best guess would be a new line).
Try:
$file = fopen("./employee_lists/".trim($user), 'w');

You should sanitize $user before using it a as file name.
$pattern = '/(;|\||`|>|<|&|^|"|'."\n|\r|'".'|{|}|[|]|\)|\()/i';
// no piping, passing possible environment variables ($),
// seperate commands, nested execution, file redirection,
// background processing, special commands (backspace, etc.), quotes
// newlines, or some other special characters
$user= preg_replace($pattern, '', $user);
$user= '"'.preg_replace('/\$/', '\\\$', $user).'"'; //make sure this is only interpreted as ONE argument
By the way, it's a bad idea using an user name for a file name. It's better to use a numeric id.

Getting the file name from a text file after string matching - PHP

I have a log file (log.txt) in the form:
=========================================
March 01 2050 13:05:00 log v.2.6
General Option: [default] log_options.xml
=========================================
Loaded options from xml file: '/the/path/of/log_options.xml'
printPDF started
PDF export
PDF file created:'/path/of/file.1.pdf'
postProcessingDocument started
INDD file removed:'/path/of/file.1.indd'
Error opening document: '/path/of/some/filesomething.indd':Error: file doesnt exist or no permissions
=========================================
March 01 2050 14:15:00 log v.2.6
General Option: [default] log_options.xml
=========================================
Loaded options from xml file: '/the/path/of/log_options.xml'
extendedprintPDF started
extendedprintPDF: Error: Unsaved documents have no full name: line xyz
Note: Each file name is of the format: 3lettersdatesomename_LO.pdf/indd. Example: MNM011112ThisFile_LO.pdf. Also, on a given day and time, the entry could either have just errors, just the message about the file created or both, like I have shown here.
The file continues this way. And, I have a db in the form:
id itemName status
1 file NULL
And so on...
Now, I am expected to go through the log file and for each file that is created or if there in an error, I am supposed to update the last column of DB with appropriate message: File created or Error. I thought of searching the string "PDF file created/Error" and then grabbing the file name.
I have tried various things like pathinfo() and strpos. But, I can't seem to understand how I am going to get it done.
Can someone please provide me some inputs on how I can solve this? The txt file and db are pretty huge.
NOTE: I provided the 2nd entry of the log file to be clear that the format in which errors appear IS NOT consistent. I would like to know if I can still achieve what I am supposed to with an inconsistent format for errors.
Can somebody please help after reading the whole question again? There have been plenty of changes from the first time I posted this.

You can use the explode method of php to break your file into pieces of words.
In case the fields in your text file are tab separated then you can explode on explode(String,'\t'); or else in case of space separated, explode on space.
Then a simple substr(word,start_index,length) on each word can give you the name of file (here start_index should be 0).
Using mysql_connect will help you connect to mysql database, or a much efficient way would be to use PDO (PHP Data Objects) to make your code much more reliable and flexible.
Another way out would be to use the preg_match method and specify a regular expression matching your error msg and parse for the file name.
You can refer to php.net manual for help any time.

Are all of the files PDFs? If so you can do a regex search on files with the .pdf extension. However, if the filename is also contained in the error string, you will need to exclude that somehow.
// Assume filenames contain only upper/lowercase letters, 0-9, underscores, periods, dashes, and forward slashes
preg_match_all('/([a-zA-Z0-9_\.-/]+\.pdf)/', $log_file_contents, $matches);
// $matches should be an array containing each filename.
// You can do array_unique() to exclude duplicates.
Edit: Keep in mind, $matches will be a multi-dimensional array as described http://php.net/manual/en/function.preg-match-all.php and http://php.net/manual/en/function.preg-match.php
To test a regex expression, you can use http://regexpal.com/

Okay, so the main issue here is that you either don't have a consistent delimiter for "entries"..or else you are not providing enough info. So based on what you have provided, here is my suggestion. The main caveat here is that without a solid delimiter for "entries," there's no way to know for sure if the error matches up with the file name. The only way to fix this is to format your file better. Also you have to fill in some blanks, like your db info and how you actually perform the query.
$handle = fopen("log.txt", "rb");
while (!feof($handle)) {
// get the current row
$row = fread($handle, 8192);
// get file names
preg_match('~^PDF file created:(.*?)$~',$row,$match);
if ( isset($match[1]) ) {
$files[] = $match[1];
}
// get errors
preg_match('~^Error:(.*?)$~',$row,$match);
if ( isset($match[1]) ) {
$errors[] = $match[1];
}
}
fclose($handle);
// connect to db
foreach ($files as $k => $file) {
// assumes your table just has basename of file
$file = basename($file);
$error = ( isset($errors[$k]) ) ? $errors[$k] : null;
$sql = "update tablename set status='$error' where itemName='$file'";
// execute query
}
EDIT: Actually going back to your post, it looks like you want to update a table not insert, so you will want to change the query to be an update. And you may need to further work with $file in that foreach for your where clause, depending on how you store your filenames in your db (for example, if you just store the basename, you will likely want to do $file = basename($file); in the foreach). Code updated to reflect this.
So hopefully this will point you in the right direction.

Ensure fgetcsv() reads the entire line

I am using PHP to import data from a CSV file using fgetcsv(), which yields an array for each row. Initially, I had the character limit set at 1024, like so:
while ($data = fgetcsv($fp, 1024)) {
// do stuff with the row
}
However, a CSV with 200+ columns surpassed the 1024 limit on many rows. This caused the line read to stop in the middle of a row, and then the next call to fgetcsv() would start where the previous one left off and so on until an EOL was reached.
I have since upped this limit to 4096, which should take care of the majority of cases, but I would like put a check in to be sure that the entire line was read after each line is fetched. How do I go about this?
I was thinking to check the end of the last element of the array for end of line characters (\n, \r, \r\n), but wouldn't these be parsed out by the fgetcsv() call?

Just omit the length parameter. It's optional in PHP5.
while ($data = fgetcsv($fp)) {
// do stuff with the row
}

Just don't specify a limit, and fgetcsv() will slurp in as much as is necessary to capture a full line. If you do specify a limit, then it's entirely up to YOU to scan the file stream and ensure you're not slicing something down the middle.
However, note that not specifying a limit can be risky if you don't have control over generation of this .csv in the first place. It'd be easy to swamp your server with a malicious CSV that has a many terabytes of data on a single line.

Thank you for the suggestions, but these solutions really didn't solve the issue of knowing that we account for the longest line while still providing a limit. I was able to accomplish this by using the wc -L UNIX command via shell_exec() to determine the longest line in the file prior to beginning the line fetching. The code is below:
// open the CSV file to read lines
$fp = fopen($sListFullPath, 'r');
// use wc to figure out the longest line in the file
$longestArray = explode(" ", shell_exec('wc -L ' . $sListFullPath));
$longest_line = (int)$longestArray[0] + 4; // add a little padding for EOL chars
// check against a user-defined maximum length
if ($longest_line > $line_length_max) {
// alert user that the length of at least one line in the CSV is too long
}
// read in the data
while ($data = fgetcsv($fp, $longest_line)) {
// do stuff with the row
}
This approach ensures that every line is read in its entirety and still provides a safety net for really long lines without stepping through the entire file with PHP line by line.

I would be careful with your final solution. I was able to upload a file named /.;ls -a;.csv to perform command injection. Make sure you validate the file path if you use this approach. Also, it might be a good idea to provide a default_length in the case your wc fails for any reason.
// use wc to find max line length
// uses a hardcoded default if wc fails
// this is relatively safe from command
// injection since the file path is a tmp file
$wc = explode(" ", shell_exec('wc -L ' . $validated_file_path));
$longest_line = (int)$wc[0];
$length = ($longest_line) ? $longest_line + 4 : $default_length;

fgetcsv() is by default is used to read line by line from a csv file but when it is not functioning that way, you have to check PHP_EOL character on your OS machine
you have simply to go:
C:\xampp\php\php.ini
and search for:
;auto_detect_line_endings = Off
and uncomment it and activate it to:
auto_detect_line_endings = On
restart Apache and check . . . should works

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Import CSV file to Postgre - php

Related

binary safe write on file with php to create a DBF file

removing a complete column and data from csv

PHP fopen("filename", w) creates files with ~1

Getting the file name from a text file after string matching - PHP

Ensure fgetcsv() reads the entire line

Categories

Resources