Dealing with a csv file in a strange format - php

I am using "LOAD DATA" functionality with phpmyadmin to update (or renew) some data in my database with the upload of an csv file. The csv file has 50 cols and 200k lines. This works pretty well and is very fast with this format:
100;101;102;103;104;....
Alfred;Mueller;Exampplestreet 1;12121;Chicago;....
John;Wiliams;Exampplestreet 2;12345;Dallas;....
Mandy;Peterson;Exampplestreet 3;44554;LA;....
...
Now I ve the chance to fully automize this process by receiving a csv data file of a data provider. But the data provider delivered an csv file like this:
100#Alfred;101#Mueller;102#Exampplestreet 1;103#12121;104#Chicago;....
100#John;101#Wiliams;102#Exampplestreet 2;103#12345;104#Dallas;....
100#Mandy;101#Peterson;102#Exampplestreet 3;103#44554;104#LA;....
Is there any chance to handle the format of the provider? I never worked with a csv file formatted like this?

It looks as though you will need to extract the field type from each value, not sure if this is relevant, but I have converted this into the key for the field in case you need it (it's not a huge amount of difference anyway).
Basically read each line as a CSV line (delimited by ;), then for each field explode() it by # and if there are 2 fields then add it to the output array ($data)...
$fileName = "data.csv";
$handle = fopen ( $fileName, "r" );
while ( !feof($handle) ) {
$fileData = fgetcsv( $handle, null, ";" );
$data = [];
foreach ( $fileData as $value ) {
$values = explode("#", $value, 2);
if ( count($values) == 2 ) {
$data[ $values[0] ] = $values[1];
}
}
print_r($data);
}
fclose($handle);
Output will be something like...
Array
(
[100] => Alfred
[101] => Mueller
[102] => Exampplestreet 1
[103] => 12121
[104] => Chicago
)
If you don't need the field type and it is always three characters followed by a #, you can make this shorter by updating the value of the read array, using substr() to always remove the first 4 characters..
while ( !feof($handle) ) {
$data = fgetcsv( $handle, null, ";" );
foreach ( $data as &$value ) {
$value = substr($value, 4);
}
print_r($data);
}
This will obviously be slower than loading it directly (and you need to add the database calls to the above).

Related

Php: How to not replace key in array when same key is found

I'm trying to parse a number of "filename.ext" comming from a file. This exercice comming from CodinGames named "Mime type".
I should take the name of each file, look at his extension, and find the matching key => value in the extensions array.
E.g: for a file named "foobar.pdf", I look if there's a dot, and if it is, I split the name in the files array [foobar => pdf]. If there's no dot like "foobar", it becomes [foobar => Unknown]. Last case asked, if there are two dots, "foo.bar.pdf" is pushing like [foo.bar => pdf] but I'm not here for that case.
My problem is that my first file name is "a" with no dot no extension.
So the code runs and does it well. It ends with the perfect input in the files array [a => Unknown].
But now, the second file name is "a.wav". And the program will replace the value of [a => Unknown] to [a => wav] cause it's the same key.
So how can we avoid these replacements ? I searched on web, but everybody are searching to replace key value, not create a new one.
edit: What I expect is have 2 differents input in my file array. In this case I want: $files = [ [a => Unknow], [a => wav], ... ]. After this I must output each mime type comming from the extensions array for each file, so in order: "Uknown", "audio/x-wav", ... this is why I must keep the first key, because there's no extension and it's good to know.
$FNAME = "a
a.wav
b.wav.tmp
test.vmp3
pdf
.pdf
mp3
report..pdf
defaultwav
.mp3.
final.";
There is my code:
$N = 4 // Number of extensions
$Q = 11 // Number of files to parse
$extensions = [
[wav] => audio/x-wav
[mp3] => audio/mpeg
[pdf] => application/pdf
[UNKNOW] => UNKNOWN
];
$files = [];
for ($i = 0; $i < $Q; $i++){
// Input / One file name per line.
$FNAME = stream_get_line(STDIN, 256 + 1, "\n");
// --- TRANSFORM STRING INTO KEY => VALUE ---
// If there is 2 dots e.g "file.name.ext"
if (preg_match('/\..*\./',$FNAME)){
// Do something e.g split string at the second dot, I don't know for the moment how to do this
// I tried regex option U but something wrong happen;
}
// If there is one dot in "filename.ext"
elseif ( preg_match("/\./",$FNAME) ){
$temp = preg_split ("/\./", $FNAME);
// Verification if extension is know
if ( array_key_exists( $temp[1], $extensions) ){
$files[$temp[0]] = $temp[1];
} else {
$files[$temp[0]] = "UNKNOW";
}
} else {
$files[$FNAME] = "UNKNOW";
}
}
With these statements, you'll overwrite the value of that key each time.
$files[$temp[0]] = $temp[1];
$files[$FNAME] = "UNKNOW";
Just use [] to append instead.
$files[$temp[0]][] = $temp[1];
$files[$FNAME][] = "UNKNOW";
By the way, I think you can use this to handle the final extension if there are multiple extensions
preg_match('/(?<=\.)[^.]+$/', $test, $ext);
what you are trying to do don't make sense to me, but still you can achieve that by replacing your inner if else block with following.
// Verification if extension is know
if ( array_key_exists( $temp[1], $extensions) ){
if(!in_array($temp[0], array_keys($files))){
$files[$temp[0]] = $temp[1];
}
} else {
if(!in_array($temp[0], array_keys($files))){
$files[$temp[0]] = "UNKNOW";
};
}

import csv in array's, explode text file by datevalue given or by numrows possible?

I have a csv file which Contains Trace Signals of Vibration Data.
Its Starts with an ISO 8601 Date 2017-01-31T16:16:21.000+01:00
then it have 1024 rows of data(512Hz 2Sec Signal). And then the next Trace signal which starts with the new Date but in the same file -.-.
2017-01-31T16:16:21.000+01:00
0,06;0,03;0,01
0,07;0,03;0,01
0,07;0,03;0,02
.... up to line 1025
2017-01-31T16:24:37.000+01:00
1,72;0,2;-0,9
1,48;0,39;-1,46
1,23;0,58;-1,67
0,99;0,76;-1,81
... up to line 2050
This file can contain much more than 2 traces, how can i pass this in seperated arrays ? i would prefer arrays like :
Array
(
[0] => Array
(
[Time] => 2017-01-31T16:16:21.000
[Data] => array ( [0] => 0,06;0,03;0,01
[1] => 0,07;0,03;0,01 etc..)
)
But I don't know how to loop through the file and explode by the datetime value and also use it. Other way was To read Firstline as Time and next 1024 rows by line and push it but how ?
You may run into problems as your array gets larger, but this is the approach:
$i = $j = 0;
if($handle = fopen('/path/to/file.csv', 'r')) {
while(($line = fgets($handle)) !== false) {
if($i % 1025 === 0) {
$j++;
$result[$j]['Time'] = $line;
} else {
$result[$j]['Data'][] = $line;
}
$i++;
}
}
Just loop through the file and test if it is the first line or one that is a multiple of 1026. If it is, then you are on a line with the time, if not it is the data.

Parsing two files, and comparing strings

So I have two files, formatted like this:
First file
adam 20 male
ben 21 male
Second file
adam blonde
adam white
ben blonde
What I would like to do, is use the instance of adam in the first file, and search for it in the second file and print out the attributes.
Data is seperated by tab "\t", so this is what I have so far.
$firstFile = fopen("file1", "rb"); //opens first file
$i=0;
$k=0;
while (!feof($firstFile) ) { //feof = while not end of file
$firstFileRow = fgets($firstFile); //fgets gets line
$parts = explode("\t", $firstFileRow); //splits line into 3 strings using tab delimiter
$secondFile= fopen("file2", "rb");
$countRow = count($secondFile); //count rows in second file
while ($i<= $countRow){ //while the file still has rows to search
$row = fgets($firstFile); //gets whole row
$parts2 = explode("\t", $row);
if ($parts[0] ==$parts2[0]){
print $parts[0]. " has " . $parts2[1]. "<br>" ; //prints out the 3 parts
$i++;
}
}
}
I cant figure out how to loop through the second file, get each row, and compare to the first file.
You have a typo in the inner loop, you are reading firstfile and should be reading second file. In addition, after exiting inner loop you would want to re-wind the secondfile pointer back to the beginning.
How about this:
function file2array($filename) {
$file = file($filename);
$result = array();
foreach ($file as $line) {
$attributes = explode("\t", $line);
foreach (array_slice($attributes, 1) as $attribute)
$result[$attributes[0]][] = $attribute;
}
return $result;
}
$a1 = file2array("file1");
$a2 = file2array("file2");
print_r(array_merge_recursive($a1, $a2));
It will ouput the following:
Array (
[adam] => Array (
[0] => 20
[1] => male
[2] => blonde
[3] => white
)
[ben] => Array (
[0] => 21
[1] => male
[2] => blonde
)
)
However this one reads both files in one piece and will crash, if they are large ( >100MB). On the other hand 90% of all php programs have this problem, since file() is popular :-)

multidimensional array processing

I have a tab delimited text file like this:
"abcdef1" "AB"
"abcdef1" "CD"
"ghijkl3" "AA"
"ghijkl3" "BB"
"ghijkl3" "CC"
For every common ID (e.g. abcdef1), I need to take the two digit code an concatenate it into a multi-value. So, eventually it should look like:
"abcdef1" "AB,CD"
"ghijk13", "AA,BB,CC"
I dont need to create a new output txt file but if i can get the final values in an array that would be great. I am just a week old to php, hence looking for help with this. I was able to get the values from the input txt file into an array, but further processing the array to get the common ID and take the 2 digit code and concatenate is something I'm struggling with. Any help is greatly appreciated
How about:
$values = array();
$handle = fopen($file, 'r');
// get the line as an array of fields
while (($row = fgetcsv($handle, 1000, "\t")) !== false) {
// we haven't seen this ID yet
if (!isset($values[$row[0]])) {
$values[$row[0]] = array();
}
// add the code to the ID's list of codes
$values[$row[0]][] = $row[1];
}
$values will be something like:
Array
(
[abcdef1] => Array
(
[0] => AB
[1] => CD
)
[ghijkl3] => Array
(
[0] => AA
[1] => BB
[2] => CC
)
)
There are a number of steps to the task you want to do. The first step, obviously, is getting the contents of your file. You state that you've already been able to get the contents of the file into an array. You may have done something like this:
// Assuming that $pathToFile has the correct path to your data file
$entireFile = file_get_contents( $pathToFile );
$lines = explode( '\n', $entireFile ); // Replace '\n' with '\r\n' if on Windows
How you get the lines into the array is less important. From here on out I assume that you've managed to fill the $lines array. Once you have this, the rest is fairly simple:
// Create an empty array to store the results in
$results = array();
foreach( $lines as $line ){
// Split the line apart at the tab character
$elements = explode( "\t", $line );
// Check to see if this ID has been seen
if( array_key_exists( $elements[0], $results ){
// If so, append this code to the existing codes for this ID (along with a comma)
$results[ $elements[0] ] .= ',' . $elements[1];
} else {
// If not, this is the first time we've seen this ID, start collecting codes
$results[ $elements[0] ] = $elements[1];
}
}
// Now $results has the array you are hoping for
There are some variations on this -- for example, if you want to get rid of the quote marks around each ID or around each code, you can replace $results[ $elements[0] ] with $results[ trim( $elements[0], '"' ) ] and/or replace $elements[1] with trim( $elements[1], '"' ).

php function to split an array at each blank line?

I'm building a script which will open a saved text file, export the contents to an array and then dump the contents in a database. So far I've been able to get the file upload working quite happily and can also open said file.
The trouble I'm having is the contents of the file are variable, they have a fixed structure but the contents will change every time. The structure of the file is that each "section" is seperated by a blank line.
I've used php's file() to get an array ... I'm not sure if there's a way to then split that array up every time it comes across a blank line?
$file = $target_path;
$data = file($file) or die('Could not read file!');
Example output:
[0] => domain.com
[1] => # Files to be checked
[2] => /www/06.php
[3] => /www/08.php
[4] =>
[5] => domain2.com
[6] => # Files to be checked
[7] => /cgi-bin/cache.txt
[8] => /cgi-bin/log.txt
[9] =>
[10] => domain3.com
[11] => # Files to be checked
[12] => /www/Content.js
[13] =>
I know that Field 0 and 1 will be constants, they will always be a domain name then that hash line. The lines thereafter could be anywhere between 1 line and 1000 lines.
I've looked at array_chunk() which is close to what I want but it works on a numerical value, what would be good if there was something which would work on a specified value (like a new line, or a comma or something of that sort!).
Lastly, apologies if this has been answered previously. I've searched the usual places a few times for potential solutions.
Hope you can help :)
Foxed
I think what you're looking for is preg_split. If you just split on a carriage return, you might miss lines that just have spaces or tabs.
$output = array(...);//what you just posted
$string_output = implode('', $output);
$array_with_only_populated_lines = preg_split('`\n\W+`', $string_output);
You could just do something like this. You could change it also to read the file in line-by-line rather than using file(), which would use less memory, which might be important if you use larger files.
$handle = fopen('blah', 'r');
$blocks = array();
$currentBlock = array();
while (!feof($handle)) {
$line = fgets($handle);
if (trim($line) == '') {
if ($currentBlock) {
$blocks[] = $currentBlock;
$currentBlock = array();
}
} else {
$currentBlock[] = $line;
}
}
fclose($handle);
//if is anything left
if ($currentBlock) {
$blocks[] = $currentBlock;
}
print_r($blocks);
Have you tried split('\n\n', $file);
?
You could do it by splitting first on the blank line and then on new lines, e.g.:
$file = $target_path;
$fileData = file_get_contents($file) or die('Could not read file!');
$parts = explode("\n\n", $data);
$data = array();
foreach ($parts as $part) {
$data[] = explode("\n", $part);
}
You could also use preg_split() in place of the first explode() with a regex to sp.lit on lines containing just whitespace (e.g. \s+)
I would use the function preg_grep() to reduce the resulting array:
$array = preg_grep('/[^\s]/', $array);

Categories