I have a csv file with headers that sometimes have extra fields in a certain row. This is because there was a comma in the text field that was not escaped.
Is there a way to remove a row before converting into array?
Sample csv file:
CUST_NUMBER,PO_NUMBER,NAME,SERVICE,DATE,BOX_NUMBER,TRACK_NO,ORDER_NO,INV_NO,INV_AMOUNT
757626003,7383281,JACK SMITH,GND,20180306,1,1Z1370750453578430,2018168325,119348,70.70
757626003,7383282,GERALD SMITH, JR.,GND,20180306,1,1Z9R67670395033411,2018168326,119513,63.72
757626003,7383233,SCOTT R SMITH,GND,20180306,1,1Z1370750982624042,2018168329,119349,39.33
As you can see, row 3 has an extra field because Gilbert, JR. has a comma in the text field without being escaped which puts the JR. part of the name in the SERVICE column and knocks the GND field outside of the SERVICE column into a column without a heading.
I want to remove the entire row when the row has more fields than there are headers.
After the row is removed I will convert the remaining csv into an array with something like this.
<?
$csv = array_map("str_getcsv", file("FILE.CSV",FILE_SKIP_EMPTY_LINES));
$keys = array_shift($csv);
foreach ($csv as $i => $row) {
if(count($keys) == count($row)){
$csv[$i] = array_combine($keys, $row);
}
}
?>
As suggested by #Scuzzy unset the bad row
<?php
$csv = array_map("str_getcsv", file("FILE.CSV",FILE_SKIP_EMPTY_LINES));
$keys = array_shift($csv);
foreach ($csv as $i => $row) {
if(count($keys) == count($row)){
$csv[$i] = array_combine($keys, $row);
}
else unset($csv[$i]);
}
?>
<?php
$data=<<<DATA
NUMBER,NAME,SERVICE
7375536,Ron,GND
7369530,RANDY,GND
7383287,Gilbert, JR.,GND
7383236,SCOTT,GND
DATA;
$data = array_map('str_getcsv', explode("\n", $data));
$keys = array_shift($data);
$data = array_filter($data, function($v) {
return count($v) == 3;
});
var_export($data);
Output:
array (
0 =>
array (
0 => '7375536',
1 => 'Ron',
2 => 'GND',
),
1 =>
array (
0 => '7369530',
1 => 'RANDY',
2 => 'GND',
),
3 =>
array (
0 => '7383236',
1 => 'SCOTT',
2 => 'GND',
),
)
To use the column headings as keys:
$data = array_map(function($v) use ($keys) {
return array_combine($keys, $v);
}, $data);
Using array_filter allows you to remove the items you don't want by a callback. This version uses the $keys array as the test (same as you use), passing this into the callback using use...
$csv = array_map("str_getcsv", file("books.csv",FILE_SKIP_EMPTY_LINES));
$keys = array_shift($csv);
$output = array_filter($csv, function($row) use ($keys) {
return count($row) == count($keys);
});
$output = array_values($output);
print_r($output);
So each row which doesn't have the same number of columns is removed.
I've just added the array_values() call to re-index the array.
If you could generate the file with surrounding quotes, this problem wouldn't be there...
NUMBER,NAME,SERVICE
7375536,Ron,GND
7369530,RANDY,GND
7383287,"Gilbert, JR.",GND
7383236,SCOTT,GND
You could surround any text field with quotes of your choice to make sure this isn't a problem in the future.
Alternative...
$csv = array_map("str_getcsv", file("FILE.CSV",FILE_SKIP_EMPTY_LINES));
$keys = array_shift($csv);
$out = array();
foreach ($csv as $row) {
if(count($keys) == count($row)){
$out[] = array_combine($keys, $row);
}
}
Last update:
Just while I'm waiting to go out, tried the following. This tries to fix the data, so you get all the rows out of the file...
$out = array();
foreach ($csv as $row) {
if(count($keys) != count($row)){
$row = array_merge(array_slice($row, 0, 2),
[implode(",", array_slice($row, 2, count($row)-9))],
array_slice($row, count($row)-7));
}
$out[] = array_combine($keys, $row);
}
Related
I want to separate a PHP array when they have a common prefix.
$data = ['status.1', 'status.2', 'status.3',
'country.244', 'country.24', 'country.845',
'pm.4', 'pm.9', 'pm.6'];
I want each of them in separate variables like $status, $countries, $pms which will contain:
$status = [1,2,3];
$country = [244, 24, 845]
$pms = [4,9,6]
My Current code is taking 1.5 seconds to group them:
$statuses = [];
$countries = [];
$pms = [];
$start = microtime(true);
foreach($data as $item){
if(strpos($item, 'status.') !== false){
$statuses[]= substr($item,7);
}
if(strpos($item, 'country.') !== false){
$countries[]= substr($item,8);
}
if(strpos($item, 'pm.') !== false){
$pms[]= substr($item,3);
}
}
$time_elapsed_secs = microtime(true) - $start;
print_r($time_elapsed_secs);
I want to know if is there any faster way to do this
This will give you results for more dynamic prefixs - first explode with the delimiter and then insert by the key to result array.
For separating the value you can use: extract
Consider the following code:
$data = array('status.1','status.2','status.3', 'country.244', 'country.24', 'country.845', 'pm.4','pm.9', 'pm.6');
$res = array();
foreach($data as $elem) {
list($key,$val) = explode(".", $elem, 2);
$res[$key][] = $val;
}
extract($res); // this will separate to var with the prefix name
echo "Status is: " . print_r($status); // will output array of ["1","2","3"]
This snippet took less the 0.001 second...
Thanks #mickmackusa for the simplification
Add continue to each of the if's, so if it's one of them, it won't then run the other ones... not really needed in the last one as obviously the loops starts again anyway. Should save a tiny bit of time, but doubt it'll be as much as you probably want to save.
foreach($data as $item){
if(strpos($item, 'status.') !== false){
$statuses[]= substr($item,7);
continue;
}
if(strpos($item, 'country.') !== false){
$countries[]= substr($item,8);
continue;
}
if(strpos($item, 'pm.') !== false){
$pms[]= substr($item,3);
continue;
}
}
I'd use explode to split them.
something like this:
$arr = array("status" => [],"country" => [],"pm" => []);
foreach($data as $item){
list($key,$val) = explode(".",$item);
$arr[$key][] = $val;
}
extract($res); // taken from david's answer
and it's a much more readable code (in my opinion)
___ EDIT ____
as #DavidWinder commented, this is both not dynamic and will not result in different variables - look at his answer for the most complete solution for your question
Use Explode. Also is a good way to use $limit param for performance and avoiding wrong behavior on having other '.' in values.
$arr = [];
foreach($data as $item){
list($key,$val) = explode('.', $item, 2);
if (!$key || !$val) continue;
$arr[$key][] = $val;
}
var_dump($arr);
If it was me I would do it like so...
<?php
$data = array ('status.1', 'status.2', 'status.3',
'country.244', 'country.24', 'country.845',
'pm.4', 'pm.9', 'pm.6');
$out = array ();
foreach ( $data AS $value )
{
$value = explode ( '.', $value );
$out[$value[0]][] = $value[1];
}
print_r ( $out );
?>
I'm not sure if this'll boost the performance but you could re-arrange your array in a way that each row has a heading and the corresponding value and then use array_column() to group which data you want.
This is an example of how you could group your data in such a way. (PHP 7.1.25+)
$groupedData = array_map(function($arg) {
[$key, $val] = explode('.', $arg); # for PHP 5.6 < 7.1.25 use list($key, $val) = explode(...)
return array($key => $val);
}, $data);
Then, you can pull out all of the country Id's like so:
$countries = array_column($groupedData, 'country');
Here is a live demo.
You can push data into their respective groups while destructuring. The only iterated function call is explode().
Creating individual variables for each group is a design flaw / mismanagement of array data.
Code: (Demo)
$result = [];
foreach ($data as $value) {
[$prefix, $result[$prefix][]] = explode('.', $value, 2);
}
var_export($result);
Output:
array (
'status' =>
array (
0 => '1',
1 => '2',
2 => '3',
),
'country' =>
array (
0 => '244',
1 => '24',
2 => '845',
),
'pm' =>
array (
0 => '4',
1 => '9',
2 => '6',
),
)
Use sscanf() if you want to directly/explicitly cast the numeric values as integers. Demo
I have a php code that takes the matched rows of a csv file and puts them in an array.
my csv file looks like this:
Company,Produkt,Sortiment name,31,32,33,34,35,36,37,38 //these are shoe sizes
Dockers,AD1234,Sort A,2,3,5,3,2,1,0,0 //and these numbers are how many pairs of shoes
Addidas,AB1234,Sort B,2,2,1,4,,0,0,4,3
Nike,AC1234,Sort C,0,2,0,1,4,0,4,3
Dockers,AE1234,Sort D,0,1,2,3,4,1,0,2
and my php code is
$csv = file_get_contents($_SERVER['DOCUMENT_ROOT'] . 'CsvTest/Sortiment.csv');
$input = 'Company'; // column
$value = 'Dockers'; // what value of that column
$csv = array_map("str_getcsv", explode(PHP_EOL, $csv));
$keys = array_shift($csv);
$key = array_search($input, $keys);
$sortiment_array = array();
while ($line = array_shift($csv)) {
if ($line[$key] == $value) {
$line = implode(',', $line) . PHP_EOL;
$sortiment_array[] = $line;
}
}
so var_dump($sortiment_array); will give me the following
array(2) {
[0]=>
string(39) "Dockers,AD1234,Sort A,2,3,5,3,2,1,0,0"
[1]=>
string(39) "Dockers,AE1234,Sort D,0,1,2,3,4,1,0,2"
}
What I would like to do is to have the 0 columns taken out from the array and so therefore I need to identify what pair of shoes was not 0 ? So I need the first row (which is the header for my case) to repeat itself for each key and take out the shoe size that had 0 pairs. basically my array should turn into something like:
array(2) {
[0]=>array(2)
['shoe size']=> "Producer,Produkt,Sortiment name,31,32,33,34,35,36" // no 37,38
['sortiment']=> "Dockers,AD1234,Sort A,2,3,5,3,2,1,"// no 0
[1]=>array(2)
['shoe size']=> "Producer,Produkt,Sortiment name,32,33,34,35,36,38" // no 31, 37
['sortiment']=> "Dockers,AE1234,Sort D,1,2,3,4,1,2"
}
Basically in 'shoe size' sizes should be taken out where the matched row has 0 pairs for that size. I hope I can explain it. I tried my best. Any suggestions?
If all the rows in the data are the same size, you can combine the keys and values for each line that matches, then filter that to remove the zeros.
while ($line = array_shift($csv)) {
if ($line[$key] == $value) {
// combine keys and values, and filter to remove zeros
$filtered = array_filter(array_combine($keys, $line));
// separate the resulting keys and values and add them to your output array
$sortiment_array[] = [
'shoe size' => implode(',', array_keys($filtered)),
'sortiment' => implode(',', $filtered)
];
}
}
<?php
$csv = file_get_contents($_SERVER['DOCUMENT_ROOT'] . 'CsvTest/Sortiment.csv');
$input = 'Company'; // column
$value = 'Dockers'; // what value of that column
$csv = array_map("str_getcsv", explode(PHP_EOL, $csv));
$keys = array_shift($csv);
$key = array_search($input, $keys);
$sortiment_array = array();
while ($line = array_shift($csv)) {
if ($line[$key] == $value) {
$lineStr = implode(',', $line) . PHP_EOL;
$outputKeys = [];
$outputLine = [];
// Look through $line to find non-'0' elements and for each of them,
// add the corresponding elements to $outputKeys and $outputLine:
for( $i=0; $i < sizeof($keys); $i++ ) {
if ( $line[$i] !== '0' ) { // No '0' in this slot so add this slot to $outputKeys and $outputLine:
$outputKeys[] = $keys[$i];
$outputLine[] = $line[$i];
}
}
// Join $outputKeys and $outputLines back into a string:
$sortiment_array[] = [
join(',', $outputKeys),
join(',', $outputLine)
];
}
}
print_r($sortiment_array);
You can implement the logic which does it for a pair of arrays, the first being the template (header row) and the second the csv row after the header.
function nonZeros($template, $row) {
$output = [
'shoe_size' => [],
'sortiment' => []
];
for ($index = 0; $index < count($row); $index++) {
if ($row != 0) {
$output['shoe_size'][]=$template[$index];
$output['sortiment'][]=$row[$index]
}
}
return $output;
}
and then you can loop the lines and call nonZeros, passing the corresponding arrays.
I have the following data in a csv file.
I need to rearrange the data and concate it into 2 columns. the columns will be SKU and Feature. Where SKU = SKU and Feature will be derivative from other columns in the following format.
For yellow marked row: Feature column data will be: Edge:Square Edge;Wide Plank|Finish:Glossy;Smooth|Grade:A(Select & Better/Prestige)|Installation Location:Second Floor;Main Floor........
I could parse the csv and stucked.
$lines = explode( "\n", file_get_contents( '3b.csv' ) );
$headers = str_getcsv( array_shift( $lines ) );
$data = array();
foreach ( $lines as $line ) {
$row = array();
foreach ( str_getcsv( $line ) as $key => $field )
if($headers[$key]=='sku'){
$row[ $headers[ $key ] ] = str_replace(",",";",$field);
}
if($headers[$key]!='sku' && $field!='') {
$row['feature'] = $headers[ $key ].":".str_replace(",",";",$field)."|";
}
$row = array_filter( $row );
$data[] = $row;
}
echo "<pre>";
print_r($data);
echo "</pre>";
Anyone please help me to do this or suggest any script to do this.
You haven't provided the actual text of your incoming csv files, so I will assume that parsing it normally will work properly.
I have borrow my script from your next two questions to unconditionally process your data.
The header row's data is used as a lookup array for the feature names.
Code: (untested)
$file = fopen("3b.csv", "r");
$headers = fgetcsv($file);
$final_array = [];
while (($row = fgetcsv($file)) !== false) {
$sku = $row[0];
unset($row[0]);
foreach ($row as $featureNameIndex => $featureValues) {
foreach (explode(',', $featureValues) as $featureValue) {
$final_array[] = [
'sku' => $sku,
'feature' => "{$headers[$featureNameIndex]}:{$featureValue}"
];
}
}
}
fclose($file);
var_export($final_array);
This approach will generate an indexed array of associative arrays -- each containing two-elements.
Features with multiple values are divided and stored as separate subarrays.
I want to separate a PHP array when they have a common prefix.
$data = ['status.1', 'status.2', 'status.3',
'country.244', 'country.24', 'country.845',
'pm.4', 'pm.9', 'pm.6'];
I want each of them in separate variables like $status, $countries, $pms which will contain:
$status = [1,2,3];
$country = [244, 24, 845]
$pms = [4,9,6]
My Current code is taking 1.5 seconds to group them:
$statuses = [];
$countries = [];
$pms = [];
$start = microtime(true);
foreach($data as $item){
if(strpos($item, 'status.') !== false){
$statuses[]= substr($item,7);
}
if(strpos($item, 'country.') !== false){
$countries[]= substr($item,8);
}
if(strpos($item, 'pm.') !== false){
$pms[]= substr($item,3);
}
}
$time_elapsed_secs = microtime(true) - $start;
print_r($time_elapsed_secs);
I want to know if is there any faster way to do this
This will give you results for more dynamic prefixs - first explode with the delimiter and then insert by the key to result array.
For separating the value you can use: extract
Consider the following code:
$data = array('status.1','status.2','status.3', 'country.244', 'country.24', 'country.845', 'pm.4','pm.9', 'pm.6');
$res = array();
foreach($data as $elem) {
list($key,$val) = explode(".", $elem, 2);
$res[$key][] = $val;
}
extract($res); // this will separate to var with the prefix name
echo "Status is: " . print_r($status); // will output array of ["1","2","3"]
This snippet took less the 0.001 second...
Thanks #mickmackusa for the simplification
Add continue to each of the if's, so if it's one of them, it won't then run the other ones... not really needed in the last one as obviously the loops starts again anyway. Should save a tiny bit of time, but doubt it'll be as much as you probably want to save.
foreach($data as $item){
if(strpos($item, 'status.') !== false){
$statuses[]= substr($item,7);
continue;
}
if(strpos($item, 'country.') !== false){
$countries[]= substr($item,8);
continue;
}
if(strpos($item, 'pm.') !== false){
$pms[]= substr($item,3);
continue;
}
}
I'd use explode to split them.
something like this:
$arr = array("status" => [],"country" => [],"pm" => []);
foreach($data as $item){
list($key,$val) = explode(".",$item);
$arr[$key][] = $val;
}
extract($res); // taken from david's answer
and it's a much more readable code (in my opinion)
___ EDIT ____
as #DavidWinder commented, this is both not dynamic and will not result in different variables - look at his answer for the most complete solution for your question
Use Explode. Also is a good way to use $limit param for performance and avoiding wrong behavior on having other '.' in values.
$arr = [];
foreach($data as $item){
list($key,$val) = explode('.', $item, 2);
if (!$key || !$val) continue;
$arr[$key][] = $val;
}
var_dump($arr);
If it was me I would do it like so...
<?php
$data = array ('status.1', 'status.2', 'status.3',
'country.244', 'country.24', 'country.845',
'pm.4', 'pm.9', 'pm.6');
$out = array ();
foreach ( $data AS $value )
{
$value = explode ( '.', $value );
$out[$value[0]][] = $value[1];
}
print_r ( $out );
?>
I'm not sure if this'll boost the performance but you could re-arrange your array in a way that each row has a heading and the corresponding value and then use array_column() to group which data you want.
This is an example of how you could group your data in such a way. (PHP 7.1.25+)
$groupedData = array_map(function($arg) {
[$key, $val] = explode('.', $arg); # for PHP 5.6 < 7.1.25 use list($key, $val) = explode(...)
return array($key => $val);
}, $data);
Then, you can pull out all of the country Id's like so:
$countries = array_column($groupedData, 'country');
Here is a live demo.
You can push data into their respective groups while destructuring. The only iterated function call is explode().
Creating individual variables for each group is a design flaw / mismanagement of array data.
Code: (Demo)
$result = [];
foreach ($data as $value) {
[$prefix, $result[$prefix][]] = explode('.', $value, 2);
}
var_export($result);
Output:
array (
'status' =>
array (
0 => '1',
1 => '2',
2 => '3',
),
'country' =>
array (
0 => '244',
1 => '24',
2 => '845',
),
'pm' =>
array (
0 => '4',
1 => '9',
2 => '6',
),
)
Use sscanf() if you want to directly/explicitly cast the numeric values as integers. Demo
I have csv values like this:
$csv_data = "test,this,thing
hi,there,this
is,cool,dude
have,fun";
I want to take an entire CSV string and read it into a multidemensional array so that I get:
array(
array(
'test' => 'hi',
'this' => 'there',
'thing' => 'this'
),
array(
'test' => 'is',
'this' => 'cool',
'thing' => 'dude'
),
array(
'test' => 'have',
'this' => 'fun',
'thing' => ''
)
);
I want an output like that, take note that the CSV value is dynamic.
Assuming every row in the CSV data has the same number of columns, this should work.
$lines = explode("\n", $csv_data);
$head = str_getcsv(array_shift($lines));
$array = array();
foreach ($lines as $line) {
$array[] = array_combine($head, str_getcsv($line));
}
If lines have a variable number of columns (as in your example, where the last line has 2 columns instead of 3), use this loop instead:
foreach ($lines as $line) {
$row = array_pad(str_getcsv($line), count($head), '');
$array[] = array_combine($head, $row);
}
Here is a complete solution:
$lines = explode("\n", $csv_data);
$formatting = explode(",", $lines[0]);
unset($lines[0]);
$results = array();
foreach ( $lines as $line ) {
$parsedLine = str_getcsv( $line, ',' );
$result = array();
foreach ( $formatting as $index => $caption ) {
if(isset($parsedLine[$index])) {
$result[$formatting[$index]] = trim($parsedLine[$index]);
} else {
$result[$formatting[$index]] = '';
}
}
$results[] = $result;
}
So what are we doing here?
First, your CSV data is split into array of lines with explode
Since the first row in your CSV describes data format, it must be separated from the actual data rows (explode and unset)
For storing the results, we initialize a new array ($results)
Foreach is used to iterate through the data line by line. For each line:
Line is parsed with PHP's str_getcsv
An empty result array is initialized
Each line is inspected in the light of the format. Cells are added and missing columns are padded with empty strings.
Here is a very clean and simple solution:
function parse_row($row) {
return array_map('trim', explode(',', $row));
}
$rows = str_getcsv($csv_data, "\n");
$keys = parse_row(array_shift($rows));
$result = array();
foreach ($rows as $row) {
$row = parse_row($row);
$row = array_pad($row, 3, NULL);
$result[] = array_combine($keys, $row);
}