prevent fgetcsv() from breaking on \n inside enclosure - php

My file is like this
row 1: 1,2,3,4
row 2: bob,larry,jill, "sue
she is a great girl"
row3 3:tom, fred, jack, billy
when I use fgetcsv($handle,0,',','"') it breaks up row 2 into two separate rows because column 4 has a \n in it. The people who are giving me these csv are using excel to do it and excel is only enclosing with double quotes when there is a special character.
How can I write fgetcsv() so that it does not break on \n that are inside of double quotes? I am writing very heavy PHP and am parsing sometimes millions of rows so my solution has to be memory and time efficient. It just seems to me that I must be using fgetcsv wrong because it should obviously not be breaking on an enclosed \n.

I have the code:
$row = 1;
if (($handle = fopen("test.csv", "r")) !== FALSE) {
while (($data = fgetcsv($handle, 1000, ",", '"')) !== FALSE) {
print_r($data);
}
fclose($handle);
}
and the test.csv:
1,2,3,4
bob,larry,jill, "sue
she is a great girl"
tom, fred, jack, billy
the result is as excepted:
Array
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
)
Array
(
[0] => bob
[1] => larry
[2] => jill
[3] => sue
she is a great girl
)
Array
(
[0] => tom
[1] => fred
[2] => jack
[3] => billy
)

your file must be:
1,2,3,4
bob,larry,jill, "sue
she is a great girl"
tom, fred, jack, billy
without \n in it

Related

How to read columns from text file and insert in database php

I have a text file with multiple rows and columns inside them.
I want to read each and every row and column, store them in array and save all data in database using cakephp.
Below is my code, I wrote some row and column reading logic which i wants to implement.
Please help me to do this critical thing.
public function importfile(){
$fp = 'C:/wamp64/www/jhraut/webroot/uploads/120518SU';
$handle = fopen($fp, "r");
if ($handle) {
while (($line = fgetc($handle)) !== false) {
$data['GunNo'] = "first 5 characters of $line is GunNo than 1 character is space";
$data['FatGun'] = "Second 3 characters of $line is FatGun than 1 character is space";
$data['LoinGun'] = "Third 3 characters of $line is LoinGun than 1 character is space";
$data['ScaleWt'] = "fourth 5 characters of $line is ScaleWt than 1 character is space";
$data['Partial'] = "if P or M than 1 character is space";
$data['TimeofReading'] = "last 8 characters of $line is TimeofReading";
echo $line;
}
$this->Event_program->saveAll($data);
}
fclose($fp);
exit;
}
My file data
parti 011 058 145.6 P 06:37:01
00002 016 049 175.8 06:37:08
00003 009 072 150.8 06:37:15
00004 009 053 146.8 06:37:22
00005 011 054 169 06:37:29
00006 009 052 152.4 06:37:37
00007 018 059 194.8 06:37:44
00008 009 060 139.4 06:37:51
parti 008 069 134.8 P 06:37:58
00010 023 054 194.2 06:38:05
miss 197.2 06:38:13
00011 023 052 150 06:38:20
00012 008 059 146.6 06:38:27
00013 010 067 156 06:38:34
00014 013 049 190.8 06:38:41
Try something like this:
// set path to file
$file = WWW_ROOT.'uploads/120518SU';
// check if file exists
if (file_exists($file)) {
// Reads an entire file into an array with file() method
// and loop array
foreach (file($file) as $line) {
// convert line value to new array
$arr = explode(' ', $line);
// do something with your data..
$data = [
'GunNo' => $arr[0],
// ...
];
$entity = $this->EventProgram->newEntity($data);
$this->EventProgram->save($entity);
}
}
Update with some test:
$line = '00003 009 072 150.8 06:37:15';
$arr = explode(' ', $line);
print_r($arr);
// output
Array
(
[0] => 00003
[1] => 009
[2] => 072
[3] => 150.8
[4] =>
[5] =>
[6] =>
[7] => 06:37:15
)
$line = 'parti 008 069 134.8 P 06:37:58';
// ..
Array
(
[0] => parti
[1] => 008
[2] => 069
[3] => 134.8
[4] => P
[5] =>
[6] => 06:37:58
)
Then:
// do something with your data..
$data = [
// ...
'TimeofReading' => end($arr),
];
Update: reading as csv file
Use fgetcsv()
The fgetcsv() function parses a line from an open file, checking for
CSV fields.
The fgetcsv() function stops returning on a new line, at the specified
length, or at EOF, whichever comes first.
This function returns the CSV fields in an array on success, or FALSE
on failure and EOF.
fgetcsv(file,length,separator,enclosure);
Use regular expression. As you have some rows having few fields empty splitting with space might cause problem.
preg_match('/(.{5}) (.{3}) (.{3}) (.{5}) (.{1}) (.{8})/', $line, $op);
$data['GunNo'] = $op[1];
$data['FatGun'] = $op[2];
$data['LoinGun'] = $op[3];
$data['ScaleWt'] = $op[4];
$data['Partial'] = $op[5];
$data['TimeofReading'] = $op[6];
echo $op[0];
You can teds regular expression line on
https://www.phpliveregex.com/#tab-preg-match

PHP reading CSV files in an odd order

I'm using the the League CSV library, but the same exact thing happens when I use built in PHP functions. Here is my spreadsheet:
ID column A column B column C
123 apple orange pear
And here is my code:
$stmt = (new Statement())
->offset(0)
->limit(2)
;
$records = $stmt->process($csv);
foreach ($records as $record) {
print_r($record);
}
Finally, here is the output. Notice, the ID value (123) is hanging off to the left. I'm not even sure what that is supposed to mean.
Array
(
[0] => ID
[1] => column A
[2] => column B
123 [3] => column C
[4] => apple
[5] => orange
[6] => pear
)
Edit: Here is the raw CSV file. Newline character is possibly a carriage return?
ID,column A,column B,column C
123,apple,orange,pear
This was really lame. Since I'm on a Mac, I needed to add this line to the top of the script:
ini_set('auto_detect_line_endings',TRUE);

PHP: how to remove the last line break from the text file?

In a school work, I built a site for a fictional space museum in my city using PHP. It has a Data Inclusion and Data Consultation systems, but I have a problem with the consultation that I want to know how to solve: how to delete the last line break from the file?
Data Inclusion
In the restricted area of the site, I have a HTML5 form with 5 fields (name, addres, telephone number, sex and visited exhibitions) that sends the data by the method POST to a function in PHP that writes it on a given txt file by using the fwrite command:
fwrite ($pointer, "$name | $addres | $telephone | $sex | $exhibitions " .PHP_EOL);
As you can see, it writes in a txt file the data entered on the form, plus a line break. The pointer is the variable used by fopen to open the file that I need to work. Example of output:
Márcio Aguiar | Belmiro Braga Street | 1234-5678 | M | Planets of Solar System
Joana Tobias | Santos Dummont Avenue | 8765-4321 | F | Black Holes, Satellites
Data Consultation
Then there is a consultation system. It has a loop that runs until the file ends. Inside this loop there is a variable named $buffer that gets one line of the txt file each time. It is then exploded to create a array named $lines[$counter]. To print it nicely, I use a array_combine where I join the names of the fields on another array ($keys) to the values written in $lines[$counter], and attibutes that to $combined[$counter]. Then the loop ends and I use a print_r inside <pre></pre> to see the data written in $combined, while mantaining the spaces and breaks that HTML would otherwise ignore. Here is the code:
$keys = array ("Name", "Address", "Telephone", "Sex", "Visited exhibition");
for ($counter=0;!feof($reader);$counter++){
$buffer = fgets($reader);
$lines[$counter] = explode(" | ", $buffer);
$combined[$counter] = array_combine($keys, $lines[$counter]);
}
echo "<pre>";
print_r($combined);
echo "</pre>";
Example of output:
Array
(
[0] => Array
(
[Name] => Márcio Aguiar
[Address] => Belmiro Braga Street
[Telephone] => 1234-5678
[Sex] => M
[Visited exhibitions] => Planets of Solar System
)
[1] => Array
(
[Name] => Joana Tobias
[Address] => Santos Dummont Avenue
[Telephone] => 8765-4321
[Sex] => F
[Visited exhibitions] => Black Holes, Satellites
)
[2] =>
)
Here you can see that a 2 Array was created blank. It's caused by the last line, that contains only a line break inserted by the form above. I need to remove this last line break, and only that one, but don't know how. I want to know! Not knowing causes the exhibition of an error when the execution arrive at the array_combine, because it's needed that the two arrays have the same number of elements, and 2 is blank. Here the error:
Warning: array_combine(): Both parameters should have an equal number of elements in E:\Aluno\Documents\Wamp\www\trab_1\area_restrita\consulta.php on line 60
Original answer:
To remove a trailing line break from any text, you can use trim(), however in your case, you just need to use fopen in append mode:
$handle = fopen("/path/to/file", "a");
Then get rid of the PHP_EOL:
fwrite ($pointer, "$name | $addres | $telephone | $sex | $exhibitions");
Edit: You're right that appending doesn't append to a new line. I was mistaken. So you could use trim() like I mentioned earlier. I created a quick example using file_get_contents() and file_put_contents() and it appears to do what you want:
<?php
$file = 'test.txt';
// Set existing contents (for testing sake)
$orig_contents = "bob | 123 fake street | 1234567890 | yes, please | no\n";
$orig_contents .= "bob | 123 fake street | 1234567890 | yes, please | no\n";
$orig_contents .= "bob | 123 fake street | 1234567890 | yes, please | no\n";
$orig_contents .= "bob | 123 fake street | 1234567890 | yes, please | no\n";
file_put_contents($file, $orig_contents);
// Here is how you could add a single new line with append mode
// Notice the trailing \n
file_put_contents($file, "billy | 456 fake street | 2345678901 | no | yes\n", FILE_APPEND);
// Get contents from the file and remove any trailing line breaks
$contents = trim(file_get_contents($file));
$keys = array ("Name", "Address", "Telephone", "Sex", "Visited exhibition");
// Explode based on the new line character
$lines = explode("\n", $contents);
foreach ($lines as $line) {
$values = explode(" | ", $line);
$combined[] = array_combine($keys, $values);
}
print_r($combined);
This prints:
Array
(
[0] => Array
(
[Name] => bob
[Address] => 123 fake street
[Telephone] => 1234567890
[Sex] => yes, please
[Visited exhibition] => no
)
[1] => Array
(
[Name] => bob
[Address] => 123 fake street
[Telephone] => 1234567890
[Sex] => yes, please
[Visited exhibition] => no
)
[2] => Array
(
[Name] => bob
[Address] => 123 fake street
[Telephone] => 1234567890
[Sex] => yes, please
[Visited exhibition] => no
)
[3] => Array
(
[Name] => bob
[Address] => 123 fake street
[Telephone] => 1234567890
[Sex] => yes, please
[Visited exhibition] => no
)
[4] => Array
(
[Name] => billy
[Address] => 456 fake street
[Telephone] => 2345678901
[Sex] => no
[Visited exhibition] => yes
)
)
The problem is with the way you're reading the file. You're testing for EOF before reading from the file. But feof() won't be true until you try to read while you're at the end of the file.
Instead, you should test whether fgets() returns a line.
for ($counter = 0; $buffer = fgets($reader); $counter++) {
$lines[$counter] = explode(" | ", $buffer);
$combined[$counter] = array_combine($keys, $lines[$counter]);
}
DEMO
To explain further, suppose you have a file with one line in it. When $counter is 0, you call feof(), and it returns false. So you then read the first line, and add it to $lines and $combined. Then you increment $counter and go back to the beginning of the loop.
When $counter is 1, you call feof(). It's still not true, because you haven't tried to read at the end of the file yet. Then you try to read the next line, but there is no line there, fgets returns false and you assign this to $buffer. This is treated as an empty string by explode(), so you add an empty array to $lines and $combined. Then you increment $counter and go back to the beginning of the loop.
Then you call feof(), and this time it returns true because you tried to read at the end of the file on the previous iteration. So the loop ends.
As you can see from the above, even though the file only has 1 line, you end up with 2 entries in your arrays, because you didn't test for EOF until after you read too far.
In case you want to get rid of the last line break in data you pulled from the db ($res) you can use the following snippet
for ($i=0; $i <count($res); $i++) {
// If this is not last item then add items separated by line break
if($i+1 < count($res)) {
file_put_contents("file.txt", $res[$i].PHP_EOL, FILE_APPEND);
}
// If this is the last item, then don't append the final line break
else {
file_put_contents("file.txt", $res[$i], FILE_APPEND);
}
}

Extracting string with regexp

In a text file, I have the folowing strings :
ID | LABEL | A | B | C
--------------------------------------
9999 | Oxygen Isotopes | | 0.15 | 1
8733 | Enriched Uranium | | 1 | 1
I would like to extract the fields ID and LABEL of each line using regular expression.
How I can achieve it ?
I am not certain why you insisted on regex.
As the column appear to be separated by | symbol, it seems like using PHP function explode would be an easier solution.
You would be able loop through the lines, and refer to each column using typical array index notation, for example: $line[0] and $line[1] for ID and LABEL respectively.
I doubt regex is the best solution here.
Try this to separate the text file into an array of lines (this might or might not work, depending on the OS of the machine you created the txt file on)
$lines = explode($text, "\n");
$final_lines = array();
foreach ($lines as $line) {
$parts = explode($line, " | ");
$final_lines[] = $parts;
}
Now you can access all of the data through the line number then the column, like
$final_lines[2][0]
Will contain 8733.
You could use preg_split on every line:
$array = preg_split(`/\s*\|\s*/`, $inputLine, 2);
Then as in djdy's answer, the ID will be in $array[0] and the label in $array[1].
No regex needed:
<?php
$file = file('file.txt');
$ret = array();
foreach($file as $k=>$line){
if($k<2){continue;}
list($ret['ID'][],
$ret['LABEL'][],
$ret['A'][],
$ret['B'][],
$ret['C'][]) = explode('|',$line);
}
print_r($ret);
//Label: Oxygen Isotopes ID:9999
echo 'Label: '.$ret['LABEL'][0].' ID:'.$ret['ID'][0];
/*
Array
(
[C] => Array
(
[0] => 1
[1] => 1
)
[B] => Array
(
[0] => 0.15
[1] => 1
)
[A] => Array
(
[0] =>
[1] =>
)
[LABEL] => Array
(
[0] => Oxygen Isotopes
[1] => Enriched Uranium
)
[ID] => Array
(
[0] => 9999
[1] => 8733
)
)
*/
?>
Regular expressions might not be the best approach here. I'd read in each line as a string, and use String.explode("|", input) to make an array of strings. The 0 index is your ID, the 1 index is your label, and so on for A, B, and C if you want. That's a more robust solution than using regex.
A regular expression that gets the ID might be something like
\d{4} |
You could do something similar for the label field, bug again, this isn't as robust as just using explode.
Though its not a best approach to use regular expression here but one may be like this
preg_match_all("/(\d{4}.?)\|(.*?)\|/s", $data, $matchs)
2nd and 3rd index of $matches will carry the required data
Try
$str = file_get_contents($filename);
preg_match_all('/^\s*(\d*)\s*\|\s*(.*?)\s*\|/m', $str, $matches);
// $matches[1] will have ids
// $matches[2] will have labels

How can I get useful information from largest tab-delimited file with PHP?

I have following problem. I have tabdelimited file with more 100 000 records. On every row have 6 and more elements, but i want to get 2 elements from any row.
Sample structure of the tabdelemited file :
a1 1 b1 c1 11 111
a2 2 b2 c2 12 112
a3 3 b3 c3 13 113
a4 4 b4 c4 14 114
...........................................................................
Following code returned all ellements from this file in array:
$f4 = fopen("FILE.TXT", 'r');
while (($line = fgetcsv($f4, 0, "\t")) !== FALSE)
if ($line)
$arr4[] = $line;
fclose($f4);
This code for more 100 000 rows is very very slowly. How can I get the elements they need and the algorithm to be fast?
The output result:
Array
(
[0] => Array
(
[0] => a1 //first column
[1] => b1 //third column
)
[1] => Array
(
[0] => a2
[1] => b2
)
[2] => Array
(
[0] => a3
[1] => b3
)
[3] => Array
(
[0] => a4
[1] => b4
)
)
Thanks in advance.
If I understand your question correctly, you want to retrieve 2 of some arbitrary number of columns from each row in a CSV file. To do this:
$f4 = fopen('FILE.TXT', 'r');
while (($line = fgetcsv($f4, 0, "\t")) !== FALSE) {
$arr4[] = array(
$line[2], // Use whatever indexes you need for the columns
$line[3] // here.
);
}
fclose($f4);
Optionally, you can specify a maximum length as argument #2 to fgetcsv() to speed things up a bit. See here for more info.
Edit: Also, if your column indexes are sequential (e.g. 2, 3 or 4, 5), using array_slice() might be faster, but you'd have to benchmark it to know for sure.
2 columns from every row
If your goal is to get 2 columns from every row, you're going to have to iterate all the rows. It would be best to do whatever operations are needed immediately, rather than dropping rows into an array (as it's going to eat up memory fast if multiple users can hit this script at the same time).
The only way to make this much faster will be to cache the results ahead of time. You could load the CSV into a database table and index the columns, for example.
2 columns from a row matching an ID
You can make this fairly fast by doing a regex search, rather than parsing the entire file. For instance, if you put the ID into the first column, you could do something like this:
// note that because we use file_get_contents, the file must fit in memory!
// if multiple users are hitting this at the same time, it could be a valid concern
preg_match("/^$sanitized_id/" file_get_contents('filename.csv'), $matches);
if( count($matches) > 1 ) {
$row_values = explode("\t", $matches[1]);
var_dump($row_values);
}
else {
print "No matches";
}
Alternately, if you have access to the file ahead of time, you can do the same as above and index the results in a database table, making the search fast and easy.

Categories