I am using "LOAD DATA" functionality with phpmyadmin to update (or renew) some data in my database with the upload of an csv file. The csv file has 50 cols and 200k lines. This works pretty well and is very fast with this format:
100;101;102;103;104;....
Alfred;Mueller;Exampplestreet 1;12121;Chicago;....
John;Wiliams;Exampplestreet 2;12345;Dallas;....
Mandy;Peterson;Exampplestreet 3;44554;LA;....
...
Now I ve the chance to fully automize this process by receiving a csv data file of a data provider. But the data provider delivered an csv file like this:
100#Alfred;101#Mueller;102#Exampplestreet 1;103#12121;104#Chicago;....
100#John;101#Wiliams;102#Exampplestreet 2;103#12345;104#Dallas;....
100#Mandy;101#Peterson;102#Exampplestreet 3;103#44554;104#LA;....
Is there any chance to handle the format of the provider? I never worked with a csv file formatted like this?
It looks as though you will need to extract the field type from each value, not sure if this is relevant, but I have converted this into the key for the field in case you need it (it's not a huge amount of difference anyway).
Basically read each line as a CSV line (delimited by ;), then for each field explode() it by # and if there are 2 fields then add it to the output array ($data)...
$fileName = "data.csv";
$handle = fopen ( $fileName, "r" );
while ( !feof($handle) ) {
$fileData = fgetcsv( $handle, null, ";" );
$data = [];
foreach ( $fileData as $value ) {
$values = explode("#", $value, 2);
if ( count($values) == 2 ) {
$data[ $values[0] ] = $values[1];
}
}
print_r($data);
}
fclose($handle);
Output will be something like...
Array
(
[100] => Alfred
[101] => Mueller
[102] => Exampplestreet 1
[103] => 12121
[104] => Chicago
)
If you don't need the field type and it is always three characters followed by a #, you can make this shorter by updating the value of the read array, using substr() to always remove the first 4 characters..
while ( !feof($handle) ) {
$data = fgetcsv( $handle, null, ";" );
foreach ( $data as &$value ) {
$value = substr($value, 4);
}
print_r($data);
}
This will obviously be slower than loading it directly (and you need to add the database calls to the above).
I'm trying to print out all the users in active directory using system("dsquery user"); in php, my problem is getting it trimmed down so I have an array containing all the users and nothing else, atm this is my code:
<?php
$test = system("dsquery user");
$teste = explode('CN=', $test);
print_r($teste);
$user = trim($teste[1], ",");
echo "<br \>" . $user;
?>
I can only fetch one user atm because the explode deletes everything else..
Any help is appreciated, basically what I wan't to have in the end is something like this:
$user[0] = Administrator
$user[1] = kbgrt
$user[2] = asdasd
This is the output:
"CN=Administrator,CN=Users,DC=domain,DC=local" "CN=Guest,CN=Users,DC=domain,DC=local" "CN=krbtgt,CN=Users,DC=Domain,DC=local" "CN=doctor.scripto,CN=Users,DC=domain,DC=local" –
I hope you understand otherwise comment and I'll try to explain in another way.
Its not always easy to parse output from commands, if they are not designad for it. However a good start is to try to simplify it as much as possible and see patterns. However hope the code bellow will work for you I try to commented it as much as possible.
I have started with your output from system("dsquery user");
$test = '"CN=Administrator,CN=Users,DC=domain,DC=local" "CN=Guest,CN=Users,DC=domain,DC=local" "CN=krbtgt,CN=Users,DC=Domain,DC=local" "CN=doctor.scripto,CN=Users,DC=domain,DC=local"';
To make it easier I remove spaces and " from the string
$test = str_replace(' ', ',', $test);
$test = str_replace('"', '', $test);
Now we are ready to split the string on , so we get an array $teste that has elements that starts with LDIF values("DC=" or "CN=" in our case)
$teste = explode(',', $test);
Since some users are unwanted I add an array to exclude them. Also create an array $users to keep the result.
$users = array();
$exclude_users = array("Users", "doctor.scripto");
Then its just to iterate over array and split each element, so we get the LDIF part in position 0 and name in position 1. Add the name, if position 0 is equal to CN and name in position 1 is not in the list of excluded users.
foreach ($teste as $value) {
$data = explode('=', $value);
if($data[0] == 'CN' && !in_array($data[1], $exclude_users)) {
$users[] = $data[1];
}
}
print_r($users);
Here is the result:
Array
(
[0] => Administrator
[1] => Guest
[2] => krbtgt
)
So I have two files, formatted like this:
First file
adam 20 male
ben 21 male
Second file
adam blonde
adam white
ben blonde
What I would like to do, is use the instance of adam in the first file, and search for it in the second file and print out the attributes.
Data is seperated by tab "\t", so this is what I have so far.
$firstFile = fopen("file1", "rb"); //opens first file
$i=0;
$k=0;
while (!feof($firstFile) ) { //feof = while not end of file
$firstFileRow = fgets($firstFile); //fgets gets line
$parts = explode("\t", $firstFileRow); //splits line into 3 strings using tab delimiter
$secondFile= fopen("file2", "rb");
$countRow = count($secondFile); //count rows in second file
while ($i<= $countRow){ //while the file still has rows to search
$row = fgets($firstFile); //gets whole row
$parts2 = explode("\t", $row);
if ($parts[0] ==$parts2[0]){
print $parts[0]. " has " . $parts2[1]. "<br>" ; //prints out the 3 parts
$i++;
}
}
}
I cant figure out how to loop through the second file, get each row, and compare to the first file.
You have a typo in the inner loop, you are reading firstfile and should be reading second file. In addition, after exiting inner loop you would want to re-wind the secondfile pointer back to the beginning.
How about this:
function file2array($filename) {
$file = file($filename);
$result = array();
foreach ($file as $line) {
$attributes = explode("\t", $line);
foreach (array_slice($attributes, 1) as $attribute)
$result[$attributes[0]][] = $attribute;
}
return $result;
}
$a1 = file2array("file1");
$a2 = file2array("file2");
print_r(array_merge_recursive($a1, $a2));
It will ouput the following:
Array (
[adam] => Array (
[0] => 20
[1] => male
[2] => blonde
[3] => white
)
[ben] => Array (
[0] => 21
[1] => male
[2] => blonde
)
)
However this one reads both files in one piece and will crash, if they are large ( >100MB). On the other hand 90% of all php programs have this problem, since file() is popular :-)
I have an array in php that contains all the lines of a text files (each line being one value of the array). My text file had blank lines so the array has blank lines too. I wanted to search the array for a certain value like this:
$array = array();
$lines = file("textfile.txt"); //file in to an array
foreach ($lines as $line)
{
if (stripos($line, "$$") !== false)
{
$array[] = str_replace("$$", "", $line);
}
}
The code above is searching for a $$ and replacing it with a blank. The text file holds a line with a $$1 or any number and I want it to find all instances of that line, which it is doing.
My problem is that I want it to find the next 5 lines that aren't blank after finding the $$(number) and put them into a multi dimensional array. The multidimensional array looking similar to this (the program is a test in case you are wondering why the array is named the way it is):
$test = array(
array('question' => 'What is the answer', 'ansa' => "answera", 'ansb' => "answerb", 'ansc' => "answerc", 'ansd' => "answerd"), // $test[1]
array('question' => 'What is the answer', 'ansa' => "answera", 'ansb' => "answerb", 'ansc' => "answerc", 'ansd' => "answerd"), // $test[2]
);
The next five lines after the $$(number) are a question and four answers that need to go into the array. My code with regxp and searching isn't working so i discarded it.
you can try something like this...
<?php
$lines = array_filter(file('text.txt')); //file in to an array
$questions = array();
// find your starts and pull out questions
foreach ($lines as $k=>$line)
{
if (stripos($line, "$$") !== false)
{
$questions[] = array_slice($lines, $k, 5);
}
}
// dump
var_dump($questions);
See php manual for array_slice
Have you looked at preg_replace_callback?
Something along these lines should work:
<?php
function replace_callback($matches) {
var_dump($matches);
}
preg_replace_callback('/\$\$[0-9]+\s+([^'.PHP_EOL.']+){5}/is', 'replace_callback', file_get_contents('textfile.txt'));
?>
how to find out if csv file fields are tab delimited or comma delimited. I need php validation for this. Can anyone plz help. Thanks in advance.
It's too late to answer this question but hope it will help someone.
Here's a simple function that will return a delimiter of a file.
function getFileDelimiter($file, $checkLines = 2){
$file = new SplFileObject($file);
$delimiters = array(
',',
'\t',
';',
'|',
':'
);
$results = array();
$i = 0;
while($file->valid() && $i <= $checkLines){
$line = $file->fgets();
foreach ($delimiters as $delimiter){
$regExp = '/['.$delimiter.']/';
$fields = preg_split($regExp, $line);
if(count($fields) > 1){
if(!empty($results[$delimiter])){
$results[$delimiter]++;
} else {
$results[$delimiter] = 1;
}
}
}
$i++;
}
$results = array_keys($results, max($results));
return $results[0];
}
Use this function as shown below:
$delimiter = getFileDelimiter('abc.csv'); //Check 2 lines to determine the delimiter
$delimiter = getFileDelimiter('abc.csv', 5); //Check 5 lines to determine the delimiter
P.S I have used preg_split() instead of explode() because explode('\t', $value) won't give proper results.
UPDATE: Thanks for #RichardEB pointing out a bug in the code. I have updated this now.
Here's what I do.
Parse the first 5 lines of a CSV file
Count the number of delimiters [commas, tabs, semicolons and colons] in each line
Compare the number of delimiters in each line. If you have a properly formatted CSV, then one of the delimiter counts will match in each row.
This will not work 100% of the time, but it is a decent starting point. At minimum, it will reduce the number of possible delimiters (making it easier for your users to select the correct delimiter).
/* Rearrange this array to change the search priority of delimiters */
$delimiters = array('tab' => "\t",
'comma' => ",",
'semicolon' => ";"
);
$handle = file( $file ); # Grabs the CSV file, loads into array
$line = array(); # Stores the count of delimiters in each row
$valid_delimiter = array(); # Stores Valid Delimiters
# Count the number of Delimiters in Each Row
for ( $i = 1; $i < 6; $i++ ){
foreach ( $delimiters as $key => $value ){
$line[$key][$i] = count( explode( $value, $handle[$i] ) ) - 1;
}
}
# Compare the Count of Delimiters in Each line
foreach ( $line as $delimiter => $count ){
# Check that the first two values are not 0
if ( $count[1] > 0 and $count[2] > 0 ){
$match = true;
$prev_value = '';
foreach ( $count as $value ){
if ( $prev_value != '' )
$match = ( $prev_value == $value and $match == true ) ? true : false;
$prev_value = $value;
}
} else {
$match = false;
}
if ( $match == true ) $valid_delimiter[] = $delimiter;
}//foreach
# Set Default delimiter to comma
$delimiter = ( $valid_delimiter[0] != '' ) ? $valid_delimiter[0] : "comma";
/* !!!! This is good enough for my needs since I have the priority set to "tab"
!!!! but you will want to have to user select from the delimiters in $valid_delimiter
!!!! if multiple dilimiter counts match
*/
# The Delimiter for the CSV
echo $delimiters[$delimiter];
There is no 100% reliable way to detemine this. What you can do is
If you have a method to validate the fields you read, try to read a few fields using either separator and validate against your method. If it breaks, use another one.
Count the occurrence of tabs or commas in the file. Usually one is significantly higher than the other
Last but not least: Ask the user, and allow him to override your guesses.
I'm just counting the occurrences of the different delimiters in the CSV file, the one with the most should probably be the correct delimiter:
//The delimiters array to look through
$delimiters = array(
'semicolon' => ";",
'tab' => "\t",
'comma' => ",",
);
//Load the csv file into a string
$csv = file_get_contents($file);
foreach ($delimiters as $key => $delim) {
$res[$key] = substr_count($csv, $delim);
}
//reverse sort the values, so the [0] element has the most occured delimiter
arsort($res);
reset($res);
$first_key = key($res);
return $delimiters[$first_key];
In my situation users supply csv files which are then entered into an SQL database. They may save an Excel Spreadsheet as comma or tab delimited files. A program converting the spreadsheet to SQL needs to automatically identify whether fields are tab separated or comma
Many Excel csv export have field headings as the first line. The heading test is unlikely to contain commas except as a delimiter. For my situation I counted the commas and tabs of the first line and use that with the greater number to determine if it is csv or tab
Thanks for all your inputs, I made mine using your tricks : preg_split, fgetcsv, loop, etc.
But I implemented something that was surprisingly not here, the use of fgets instead of reading the whole file, way better if the file is heavy!
Here's the code :
ini_set("auto_detect_line_endings", true);
function guessCsvDelimiter($filePath, $limitLines = 5) {
if (!is_readable($filePath) || !is_file($filePath)) {
return false;
}
$delimiters = array(
'tab' => "\t",
'comma' => ",",
'semicolon' => ";"
);
$fp = fopen($filePath, 'r', false);
$lineResults = array(
'tab' => array(),
'comma' => array(),
'semicolon' => array()
);
$lineIndex = 0;
while (!feof($fp)) {
$line = fgets($fp);
foreach ($delimiters as $key=>$delimiter) {
$lineResults[$key][$lineIndex] = count (fgetcsv($fp, 1024, $delimiter)) - 1;
}
$lineIndex++;
if ($lineIndex > $limitLines) break;
}
fclose($fp);
// Calculating average
foreach ($lineResults as $key=>$entry) {
$lineResults[$key] = array_sum($entry)/count($entry);
}
arsort($lineResults);
reset($lineResults);
return ($lineResults[0] !== $lineResults[1]) ? $delimiters[key($lineResults)] : $delimiters['comma'];
}
I used #Jay Bhatt's solution for finding out a csv file's delimiter, but it didn't work for me, so I applied a few fixes and comments for the process to be more understandable.
See my version of #Jay Bhatt's function:
function decide_csv_delimiter($file, $checkLines = 10) {
// use php's built in file parser class for validating the csv or txt file
$file = new SplFileObject($file);
// array of predefined delimiters. Add any more delimiters if you wish
$delimiters = array(',', '\t', ';', '|', ':');
// store all the occurences of each delimiter in an associative array
$number_of_delimiter_occurences = array();
$results = array();
$i = 0; // using 'i' for counting the number of actual row parsed
while ($file->valid() && $i <= $checkLines) {
$line = $file->fgets();
foreach ($delimiters as $idx => $delimiter){
$regExp = '/['.$delimiter.']/';
$fields = preg_split($regExp, $line);
// construct the array with all the keys as the delimiters
// and the values as the number of delimiter occurences
$number_of_delimiter_occurences[$delimiter] = count($fields);
}
$i++;
}
// get key of the largest value from the array (comapring only the array values)
// in our case, the array keys are the delimiters
$results = array_keys($number_of_delimiter_occurences, max($number_of_delimiter_occurences));
// in case the delimiter happens to be a 'tab' character ('\t'), return it in double quotes
// otherwise when using as delimiter it will give an error,
// because it is not recognised as a special character for 'tab' key,
// it shows up like a simple string composed of '\' and 't' characters, which is not accepted when parsing csv files
return $results[0] == '\t' ? "\t" : $results[0];
}
I personally use this function for helping automatically parse a file with PHPExcel, and it works beautifully and fast.
I recommend parsing at least 10 lines, for the results to be more accurate. I personally use it with 100 lines, and it is working fast, no delays or lags. The more lines you parse, the more accurate the result gets.
NOTE: This is just a modifed version of #Jay Bhatt's solution to the question. All credits goes to #Jay Bhatt.
When I output a TSV file I author the tabs using \t the same method one would author a line break like \n so that being said I guess a method could be as follows:
<?php
$mysource = YOUR SOURCE HERE, file_get_contents() OR HOWEVER YOU WISH TO GET THE SOURCE;
if(strpos($mysource, "\t") > 0){
//We have a tab separator
}else{
// it might be CSV
}
?>
I Guess this may not be the right manner, because you could have tabs and commas in the actual content as well. It's just an idea. Using regular expressions may be better, although I am not too clued up on that.
you can simply use the fgetcsv(); PHP native function in this way:
function getCsvDelimeter($file)
{
if (($handle = fopen($file, "r")) !== FALSE) {
$delimiters = array(',', ';', '|', ':'); //Put all that need check
foreach ($delimiters AS $item) {
//fgetcsv() return array with unique index if not found the delimiter
if (count(fgetcsv($handle, 0, $item, '"')) > 1) {
$delimiter = $item;
break;
}
}
}
return (isset($delimiter) ? $delimiter : null);
}
Aside from the trivial answer that c sv files are always comma-separated - it's in the name, I don't think you can come up with any hard rules. Both TSV and CSV files are sufficiently loosely specified that you can come up with files that would be acceptable as either.
A\tB,C
1,2\t3
(Assuming \t == TAB)
How would you decide whether this is TSV or CSV?
You also can use fgetcsv (http://php.net/manual/en/function.fgetcsv.php) passing it a delimiter parameter. If the function returns false it means that the $delimiter parameter wasn't the right one
sample to check if the delimiter is ';'
if (($data = fgetcsv($your_csv_handler, 1000, ';')) !== false) { $csv_delimiter = ';'; }
How about something simple?
function findDelimiter($filePath, $limitLines = 5){
$file = new SplFileObject($filePath);
$delims = $file->getCsvControl();
return $delims[0];
}
This is my solution.
Its works if you know how many columns you expect.
Finally, the separator character is the $actual_separation_character
$separator_1=",";
$separator_2=";";
$separator_3="\t";
$separator_4=":";
$separator_5="|";
$separator_1_number=0;
$separator_2_number=0;
$separator_3_number=0;
$separator_4_number=0;
$separator_5_number=0;
/* YOU NEED TO CHANGE THIS VARIABLE */
// Expected number of separation character ( 3 colums ==> 2 sepearation caharacter / row )
$expected_separation_character_number=2;
$file = fopen("upload/filename.csv","r");
while(! feof($file)) //read file rows
{
$row= fgets($file);
$row_1_replace=str_replace($separator_1,"",$row);
$row_1_length=strlen($row)-strlen($row_1_replace);
if(($row_1_length==$expected_separation_character_number)or($expected_separation_character_number==0)){
$separator_1_number=$separator_1_number+$row_1_length;
}
$row_2_replace=str_replace($separator_2,"",$row);
$row_2_length=strlen($row)-strlen($row_2_replace);
if(($row_2_length==$expected_separation_character_number)or($expected_separation_character_number==0)){
$separator_2_number=$separator_2_number+$row_2_length;
}
$row_3_replace=str_replace($separator_3,"",$row);
$row_3_length=strlen($row)-strlen($row_3_replace);
if(($row_3_length==$expected_separation_character_number)or($expected_separation_character_number==0)){
$separator_3_number=$separator_3_number+$row_3_length;
}
$row_4_replace=str_replace($separator_4,"",$row);
$row_4_length=strlen($row)-strlen($row_4_replace);
if(($row_4_length==$expected_separation_character_number)or($expected_separation_character_number==0)){
$separator_4_number=$separator_4_number+$row_4_length;
}
$row_5_replace=str_replace($separator_5,"",$row);
$row_5_length=strlen($row)-strlen($row_5_replace);
if(($row_5_length==$expected_separation_character_number)or($expected_separation_character_number==0)){
$separator_5_number=$separator_5_number+$row_5_length;
}
} // while(! feof($file)) END
fclose($file);
/* THE FILE ACTUAL SEPARATOR (delimiter) CHARACTER */
/* $actual_separation_character */
if ($separator_1_number==max($separator_1_number,$separator_2_number,$separator_3_number,$separator_4_number,$separator_5_number)){$actual_separation_character=$separator_1;}
else if ($separator_2_number==max($separator_1_number,$separator_2_number,$separator_3_number,$separator_4_number,$separator_5_number)){$actual_separation_character=$separator_2;}
else if ($separator_3_number==max($separator_1_number,$separator_2_number,$separator_3_number,$separator_4_number,$separator_5_number)){$actual_separation_character=$separator_3;}
else if ($separator_4_number==max($separator_1_number,$separator_2_number,$separator_3_number,$separator_4_number,$separator_5_number)){$actual_separation_character=$separator_4;}
else if ($separator_5_number==max($separator_1_number,$separator_2_number,$separator_3_number,$separator_4_number,$separator_5_number)){$actual_separation_character=$separator_5;}
else {$actual_separation_character=";";}
/*
if the number of columns more than what you expect, do something ...
*/
if ($expected_separation_character_number>0){
if ($separator_1_number==0 and $separator_2_number==0 and $separator_3_number==0 and $separator_4_number==0 and $separator_5_number==0){/* do something ! more columns than expected ! */}
}
If you have a very large file example in GB, head the first few line, put in a temporary file. Open the temporary file in vi
head test.txt > te1
vi te1
Easiest way I answer this is open it in a plain text editor, or in TextMate.