PHP CSV-Upload UTF-8 (with and without BOM)

PHP CSV-Upload UTF-8 (with and without BOM) - php

Can someone perhaps explain me the difference - and how to recognize or change the format?
I've a simple HTML-Upload-Form and after uploading I parse the file contents with fgetcsv(). After parsing I've an array like this
array(2) {
[0]=>
array(9) {
["OrderId"]=>
string(13) "FG-456887"
["Product"]=>
string(7) "B9876"
}
[1]=>
array(9) {
["OrderId"]=>
string(13) "FG-852562"
["Product"]=>
string(7) "B9877"
}
}
var_dump() shows me (apparently) exactly the same dump, when using files with or without BOM, but when I make a simple loop over this array and check if the OrderId (first field in the CSV) is empty - this always fails, when the CSV is encoded without BOM. When I save the same file with BOM - everything works fine.
foreach ($data as $position) {
$orderid = $position["OrderId"];
if (empty($orderid)) die('No orderid found');
}
And it is only the first field - the other fields are ok.

Found it myself. Don't know, if it's elegant - but it works...
function remove_utf8_bom($text) {
$bom = pack('H*','EFBBBF');
$text = preg_replace("/^$bom/", '', $text);
return $text;
}
function csv_to_array($filename='', $delimiter=';', $seperator = '"') {
if(!file_exists($filename) || !is_readable($filename))
return FALSE;
$csvdata = file($filename);
$header = NULL;
$data = array();
foreach ($csvdata as $line) {
$row = remove_utf8_bom($line);
$row = str_getcsv($row,$delimiter,$seperator);
if(!$header)
$header = $row;
else
$data[] = array_combine($header, $row);
}
return $data;
}

Background:
Unbeknownst to me I was in the same situation. I only realized it when I could not use the data that I imported from csv files.
Problem:
While importing two columns from a CSV file I could not access the data in the first column in the array:
array() => ['project_nr' => '0000000', 'project_name']
I tried:
array_keys($myArray);
And it worked as expected, but not until further analysis did I see that the first column 'project_nr' was 13 characters and not 10 characters. Which I later realized was BOM being read in.
Solution:
$str = file_get_contents('yourfile.utf8.csv');
$bom = pack("CCC", 0xef, 0xbb, 0xbf);
if (0 === strncmp($str, $bom, 3)) {
echo "BOM detected - file is UTF-8\n";
$str = substr($str, 3);
}
Reference:
Here is where I found the solution
Anecdote:
I placed this solution here in hopes of connecting google searches for not being able to access specific keys in an array to BOM UTF8 CSV upload.(which is what I needed and was not able to find) I hope that perhaps it may be of help to some desperately searching soul.

Related

Problem using parse_url() on extracted url from a csv through fgetcsv()

I do have a quite strange happening to me, and i can't seem to figure where is my problem
I have a csv file I use to export datas. It's filled with urls and other stuff.
I have extracted URL in this the array $urlsOfCsv
I extracts csv lines into an array this way :
$request->file('file')->move(public_path('uploads/temp/'),'tempcsv.csv');
$file = fopen(public_path('uploads/temp/').'tempcsv.csv',"r");
$lines = [];
fgetcsv($file, 10000, ",");
$o=0;
while (($data = fgetcsv($file, 0, "\t")) !== FALSE) {
$lines[$o]= $data;
$o++;
}
fclose($file);
File::delete($file);
$urlsOfCsv = array_column($lines,0);
but I can't extract domain with parse_url() because I'm getting this strange thing :
foreach($urlsOfCsv as $url){
var_dump($url);
var_dump(parse_url($url));
}
will give me result like this :
string(41) "https://www.h4d.com/" array(1) { ["path"]=> string(41) "_h_t_t_p_s_:_/_/_w_w_w_._h_4_d_._c_o_m_/_" }
string(73) "https://www.campussuddesmetiers.com/" array(1) { ["path"]=> string(73) "_h_t_t_p_s_:_/_/_w_w_w_._c_a_m_p_u_s_s_u_d_d_e_s_m_e_t_i_e_r_s_._c_o_m_/_" }
string(69) "http://altitoy-ternua.com/?lang=es" array(2) { ["path"]=> string(53) "_h_t_t_p_:_/_/_a_l_t_i_t_o_y_-_t_e_r_n_u_a_._c_o_m_/_" ["query"]=> string(15) "_l_a_n_g_=_e_s_" }
string(81) "https://www.opquast.com/communaute/jobs/" array(1) { ["path"]=> string(81) "_h_t_t_p_s_:_/_/_w_w_w_._o_p_q_u_a_s_t_._c_o_m_/_c_o_m_m_u_n_a_u_t_e_/_j_o_b_s_/_" }
I don't even have the 'host' key inside the array.
Any idea why I get this result ?
I tried lot of things with regex to use some other function. But i get either empty results or anything.
I suppose this has something to do with the csv stuff, but I can't find where.

Thanks to Cbroe I manage to found the solution that was indeed pretty obvious.
I had bad encoding in my csv. After a little bit of research i found my file to be encoded in UTF-16.
I tried convert encoding that way ( which is probably not optimal given the double loop ) :
while (($data = fgetcsv($file, 0, "\t")) !== FALSE) {
for($i=0;$i<count($data);$i++){
$data[$i] = mb_convert_encoding( $data[$i],'UTF-8','UTF-16');
}
$lines[$o]= $data;
$o++;
}
And now it works just fine. parse_url() will give me the awaited result ( UrlParser::getDomain() also works for me ).

How to remove unwanted characters/text - php

When I retrieve a file, it outputs:
...22nlarray(3) { [0]=> string(62) "/public_html/wp/wp-content/plugins/AbonneerProgrammas/Albums/." [1]=> string(63) "/public_html/wp/wp-content/plugins/AbonneerProgrammas/Albums/.." [2]=> string(69) "/public_html/wp/wp-content/plugins/AbonneerProgrammas/Albums/22nl.mp3" }
I, however, only want the 22nl displayed. I do not want the rest over there. How can I do that? Is there a function that deletes the rest of the output except 22nl (which is a filename)?
My PHP code:
// get contents of the current directory
$contents = ftp_nlist($conn_id, $destination_folder);
foreach ($contents as $mp3_url) {
$filename = basename($mp3_url, ".mp3");
echo "<a href='$mp3_url'>$filename</a>";
}
var_dump($contents);
There are similar questions to mine, however, they did not provide a good answer for me.
Greetings,
Rezoo Aftib

We can say that your value from the array will be
$value='string(69) "/public_html/wp/wp-content/plugins/AbonneerProgrammas/Albums/22nl.mp3"';
So, you can make that to export your file name:
$value = explode('"',$value);
$exploded = explode('/', $value[1]);
$full_mp3_name = end($exploded);
$just_name = explode(".",$full_mp3_name);
$just_name = $just_name[0];
When you print $full_mp3_name you will need to have 22nl.mp3
When you print $just_name you will need to have 22nl
It will better with functions but its example if it will be option for you.

PHP Writing Complex Order Info to CSV

I am writing an order exporter in PHP and am having difficulty with php checking a csv header and then writing to a file.
I have my code which opens the file, then writes to the csv
$fh = fopen($file_compile, 'w');
I then write my header:
$header = '"OrderId", "Customer";
foreach ($products as $product) {
$header .= ', "' . $product . '"';
}
fputs($fh, $header . "\n");
which gives me an output
I now generate my $line array
which outputs:
array(3) {
["OrderId"]=>
string(9) "100000033"
["Customer"]=>
string(14) "Graeme Houston"
["total"]=>
array(3) {
["Socks"]=>
int(12)
["Books"]=>
int(23)
["Wallets"]=>
int(12)
}
}
Now as you can see, my array doesn't exactly match the CSV above it, which is fine, what I would like to achieve is, if there is a value in the total that matches a value in the header, to but the integer value in the cell. (scratches head)
To illustrate:
I must add I have been trying to figure this out for a few days, I cant get my head round it, hence the post on here. I would be extremely grateful if someone could point me in the right direction.

Not that complex...
$header = array('OrderId','Customer');
foreach ($products as $product) {
$header[] = $product;
}
$tmp = '"' .implode('","',$header) . '"';
fputs($fh, $tmp . "\n");
For your header construction... Then for you line output something similar to:
foreach($header as $key){
if(empty($line[$key])) $line[$key] = NULL;
if(empty($line[$key]) && !empty($line['total'][$key])){
$line[$key] = $line['total'][$key];
}
}
unset($line['total']);
Now you've filled the gaps for non-existent columns, and given values to the existent ones.

Opening and reading a 2GB csv

I have a been having problems opening and reading the contents of a 2gb csv file. Everytime I run the script it exhausts the servers memory (10GB VPS Cloud Server) and then gets killed. I have made a test script and was wondering if anyone could have a look and confirm that I am not doing anything silly (php wise) here that would cause what seems and unsually high amount of memory usage. I have spoken to my hosting company but they seem to be of the opinion that it is a code problem. So just wondering if anyone can look over this and confirm there is nothing in the code that would cause this kind of problem.
Also if you deal with 2GB csvs, have you encounted anything like this before ?
Thanks
Tim
<?php
ini_set("memory_limit", "10240M");
$start = time();
echo date("Y-m-d H:i:s", $start)."\n";
$file = 'myfile.csv';
$lines = $keys = array();
$line_count = 0;
$csv = fopen($file, "r");
if(!empty($csv))
{
echo "file open \n";
while(($csv_line = fgetcsv($csv, null, ',', '"')) !== false)
{
if($line_count==0) {
foreach($csv_line as $item) {
$keys[] = preg_replace("/[^a-zA-Z0-9]/", "", $item);
}
} else {
$array = array();
for ($i = 0; $i <count($csv_line); $i++) {
$array[$keys[$i]] = $csv_line[$i];
}
$lines[] = (object) $array;
//print_r($array);
//echo "<br/><br/>";
}
$line_count++;
}
if ($line_count == 0) {
echo "invalid csv or wrong delimiter / enclosure ".$file;
}
} else {
echo "cannot open ".$file;
}
fclose ($csv);
echo $line_count . " rows \n";
$end = time();
echo date("Y-m-d H:i:s", $end)."\n";
$time = number_format((($end - $start)/60), 2);
echo $time."\n";
echo "peak memory usages ".memory_get_peak_usage(true)."\n";

it is not actually an "opening" problem but rather processing problem
I am sure you don't need to keep all the parsed lines in the memory like you currently do.
Why not just put the parsed line wherever it belongs to - a database or another file or anything?
It will make your code to keep in the memory as little as just one line at a time.

As others have already pointed out, you're loading the whole 2 GB file into memory. You do this while creating an array with multiple strings out of each line, so factually the resulting memory needed is more than the plain file-size.
You might want to process each row of the CSV file separately, ideally with an iterator, for example one that returns each line as a keyed array:
$csv = new CSVFile('../data/test.csv');
foreach ($csv as $line) {
var_dump($line);
}
Exemplary output here:
array(3) {
["Make"]=> string(5) "Chevy"
["Model"]=> string(4) "1500"
["Note"]=> string(6) "loaded"
}
array(3) {
["Make"]=> string(5) "Chevy"
["Model"]=> string(4) "2500"
["Note"]=> string(0) ""
}
array(3) {
["Make"]=> string(5) "Chevy"
["Model"]=> string(0) ""
["Note"]=> string(6) "loaded"
}
This iterator is inspired by one that's build in in PHP called SPLFileObject. As this is an iterator, you decide what you do with each line's/row's data. See the related question: Process CSV Into Array With Column Headings For Key
class CSVFile extends SplFileObject
{
private $keys;
public function __construct($file)
{
parent::__construct($file);
$this->setFlags(SplFileObject::READ_CSV);
}
public function rewind()
{
parent::rewind();
$this->keys = parent::current();
parent::next();
}
public function current()
{
return array_combine($this->keys, parent::current());
}
public function getKeys()
{
return $this->keys;
}
}

PHP is really the wrong language for this. String manipulation usually results in copies of strings being allocated in memory, and garbage collection will only occur when the script ends, when it is really no longer needed. If you know how to do it, and it fits the execution environment, you'd be better with perl or sed/awk.
Having said this, there are two memory hogs on the script. The first is the foreach, which copies the array. Do a foreach on the array_keys, and refer back to the string entry in the array to get at the lines. The second, is the one referred by #YourCommonSense: you should design your algorithm so it works in streaming mode (i.e. not requiring the storage of the full dataset in memory). At a cursory glance, it seems feasible.

Parsing file in PHP; how to detect newlines?

I've got a problem where I'm trying to read a text file like this:
Joe
Johnson
Linus
Tourvalds
and while parsing it in php, I need to be able to detect the newlines. I'm trying to correctly define $newline. I'm looping through the array of lines in the $file variable.
while($line = next($file))
if($line = $newline)
echo "new line";
The problem is that I can't seem to match the newline character. I know that it is actually showing up in the $file array, because this:
while($line = next($file))
echo $line;
outputs the file verbatim, with newlines and all. I've already tried "\n", " ", and I'm not sure what to try next. A little help?

$file = file("path/to/file.txt");
// Incase you need to call it multiple times ...
function isNewLine($line) {
return !strlen(trim($line));
}
foreach ($file as $line) {
if (isNewLine($line)) {
echo "new line<br/>";
}
}

Maybe something like this would work for you?
while($line = next($file)) {
if(in_array($line, array("\r", "\n", "\r\n"))) {
echo "new line";
}
}

I think this solution may help you guys. This works if you are parsing csv that is generated from Mac or windows. Reading csv with multilines created in Mac, gives problem i.e. you cannot read each line in a loop but all csv data is read as single line.
This problem is solved by following solution:
//My CSV contains only one column
$fileHandle = fopen("test.csv",'r');
$codesArray = array();
count = 0;
while (!feof($fileHandle) ) {
$line = fgetcsv($fileHandle);
if($line[0]!="") {
$data = str_replace("'", "", (nl2br ($line[0])));
$dataArray = explode('<br />' ,$data );
foreach($dataArray as $data) {
$codesArray[] = trim($data);
}
}
}
echo "<pre>";
print_r($codesArray);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP CSV-Upload UTF-8 (with and without BOM) - php

Related

Problem using parse_url() on extracted url from a csv through fgetcsv()

How to remove unwanted characters/text - php

PHP Writing Complex Order Info to CSV

Opening and reading a 2GB csv

Parsing file in PHP; how to detect newlines?

Categories

Resources