explode csv file on delimiter (;) and delimiter(,)? - php

when I explode csv file on delimiter (;)
the explode successfully in some excel program and failed in others
also when I explode csv file on delimiter (,)
the explode successfully in some excel program and failed in others
How can I do explode in all versions of excel?
How can I know the perfect delimiter to explode?
yes there is code..
if (!function_exists('create_csv')) {
function create_csv($query, &$filename = false, $old_csv = false) {
if(!$filename) $filename = "data_export_".date("Y-m-d").".csv";
$ci = &get_instance();
$ci->load->helper('download');
$ci->load->dbutil();
$delimiter = ";";
$newline = "\r\n";
$csv = "Data:".date("Y-m-d").$newline;
if($old_csv)
$csv .= $old_csv;
else
$csv .= $ci->dbutil->csv_from_result($query, $delimiter, $newline);
$columns = explode($newline, $csv);
$titles = explode($delimiter, $columns[1]);
$new_titles = array();
foreach ($titles as $item) {
array_push($new_titles, lang(trim($item,'"')));
}
$columns[1] = implode($delimiter, $new_titles);
$csv = implode($newline, $columns);
return $csv;
}
}
sometimes I put $delimiter = ";";
and sometims $delimiter = ",";
thanks..

You can use helper function to detect best delimiter like:
public function find_delimiter($csv)
{
$delimiters = array(',', '.', ';');
$bestDelimiter = false;
$count = 0;
foreach ($delimiters as $delimiter)
if (substr_count($csv, $delimiter) > $count) {
$count = substr_count($csv, $delimiter);
$bestDelimiter = $delimiter;
}
return $bestDelimiter;
}

If you have an idea of the expected data (number of columns) then this might work as a good guess, and could be a good alternative to comparing which occurs the most (depending on what kind of data you're expecting).
It would work even better if you have a header record, I'd imagine. (You could put in a check for specific header values)
Sorry for not fitting it into your code, but I am not really sure what those calls you are making do, but you should be able to fit it around.
$expected_num_of_columns = 10;
$delimiter = "";
foreach (array(",", ";") as $test_delimiter) {
$fid = fopen ($filename, "r");
$csv_row = fgetcsv($fid, 0, $test_delimiter);
if (count($csv_row) == $expected_num_of_columns) {
$delimiter = $test_delimiter;
break;
}
fclose($fid);
}
if (empty($delimiter)) {
die ("Input file did not contain the correct number of fields (" . $expected_num_of_columns . ")");
}
Don't use this if, for example, all or most of the fields contain non-integer numbers (e.g. a list of monetary amounts) and has no header record, because files separated by ; are most likely to use , as the decimal point and there could be the same number of commas and semi-colons.

The short answer is, you probably can't unless you can apply some heuristic to determine the file format. If you don't know and can't detect the format of the file you're parsing, then parsing it is going to be difficult.
However, once you have determined (or, required a particular one) the delimiter format. You will probably find that php's built-in fgetcsv will be easier and more accurate than a manual explode based strategy.

There is no way to be 100% sure you are targeting the real delimiter. All you can do is guessing.
You should start by finding the right delimiter, then explode the CSV on this delimiter.
To find the delimiter, basically, you want a function that counts the number of , and the number of ; and that returns the greater.
Something like :
$array = explode(find_delimiter($csv), $csv);
Hope it helps ;)
Edit : Your find_delimiter function could be something like :
function find_delimiter($csv)
{
$arrDelimiters = array(',', '.', ';');
$arrResults = array();
foreach ($arrDelimiters as $delimiter)
{
$arrResults[$delimiter] = count(explode($delimiter, $csv));
}
$arrResults = rsort($arrResults);
return (array_keys($arrResults)[0]);
}

Well, it looks like you exactly know that your delimiter will be "," or ";". This is a good place to start. Thus, you may try to replace all commas (,) to semicolons (;), and then explode by the semicolon only. However, in this approach you would definitely have a problem in some cases, because some lines of your CSV files could be like this:
"name,value",other name,other value,last name;last value
In this way delimiter of your CSV file will be comma if there will be four columns in your CSV file. However, by changing commas to semicolons you would get five columns which would be incorrect. So, changing some delimiter to another is not a good way.
But still, if your CSV file is correctly formatted, then you may find correct delimiter in any of the lines. So, you may try to create some function like find_delimiter($csvLine) as proposed by #johnkork, but the problem with this is that the function itself can't know which delimiter to search for. However, you exactly know all the possible delimiters, so you may try to create another, quite similar, function like delimiter_exists($csvLine, $delimiter) which returns true or false.
But even the function delimiter_exists($csvLine, $delimiter) is not enough. Why? Because for the instance of CSV line provided above you would get that both "," and ";" are delimiters that exists. For comma it would CSV file with four columns, and for semicolon it would be two columns.
Thus, there is no universal way which would get you exactly what you want. However, there may be another way you can check for - the first line of CSV file which is the header assuming your CSV files have a header. Mostly, headers in CSV file have (not necessarily) no other symbols, except for the alphanumeric names of the columns, which are delimited by the specific delimiter. So, you may try to create function like delimiter_exists($csvHeader, $delimiter) whose implementation could be like this:
function delimiter_exists($csvHeader, $delimiter) {
return (bool)preg_match("/$delimiter/", $csvHeader);
}
For you specific case you may use it like this:
$csvHeader = "abc;def";
$delimiter = delimiter_exists($csvHeader, ',') ? ',' : ';';
Hope this helps!

Related

very large php string magically turns into array

I am getting an "Array to string conversion error on PHP";
I am using the "variable" (that should be a string) as the third parameter to str_replace. So in summary (very simplified version of whats going on):
$str = "very long string";
str_replace("tag", $some_other_array, $str);
$str is throwing the error, and I have been trying to fix it all day, the thing I have tried is:
if(is_array($str)) die("its somehow an array");
serialize($str); //inserted this before str_replace call.
I have spent all day on it, and no its not something stupid like variables around the wrong way - it is something bizarre. I have even dumped it to a file and its a string.
My hypothesis:
The string is too long and php can't deal with it, turns into an array.
The $str value in this case is nested and called recursively, the general flow could be explained like this:
--code
//pass by reference
function the_function ($something, &$OFFENDING_VAR, $something_else) {
while(preg_match($something, $OFFENDING_VAR)) {
$OFFENDING_VAR = str_replace($x, y, $OFFENDING_VAR); // this is the error
}
}
So it may be something strange due to str_replace, but that would mean that at some point str_replace would have to return an array.
Please help me work this out, its very confusing and I have wasted a day on it.
---- ORIGINAL FUNCTION CODE -----
//This function gets called with multiple different "Target Variables" Target is the subject
//line, from and body of the email filled with << tags >> so the str_replace function knows
//where to replace them
function perform_replacements($replacements, &$target, $clean = TRUE,
$start_tag = '<<', $end_tag = '>>', $max_substitutions = 5) {
# Construct separate tag and replacement value arrays for use in the substitution loop.
$tags = array();
$replacement_values = array();
foreach ($replacements as $tag_text => $replacement_value) {
$tags[] = $start_tag . $tag_text . $end_tag;
$replacement_values[] = $replacement_value;
}
# TODO: this badly needs refactoring
# TODO: auto upgrade <<foo>> to <<foo_html>> if foo_html exists and acting on html template
# Construct a regular expression for use in scanning for tags.
$tag_match = '/' . preg_quote($start_tag) . '\w+' . preg_quote($end_tag) . '/';
# Perform the substitution until all valid tags are replaced, or the maximum substitutions
# limit is reached.
$substitution_count = 0;
while (preg_match ($tag_match, $target) && ($substitution_count++ < $max_substitutions)) {
$target = serialize($target);
$temp = str_replace($tags,
$replacement_values,
$target); //This is the line that is failing.
unset($target);
$target = $temp;
}
if ($clean) {
# Clean up any unused search values.
$target = preg_replace($tag_match, '', $target);
}
}
How do you know $str is the problem and not $some_other_array?
From the manual:
If search and replace are arrays, then str_replace() takes a value
from each array and uses them to search and replace on subject. If
replace has fewer values than search, then an empty string is used for
the rest of replacement values. If search is an array and replace is a
string, then this replacement string is used for every value of
search. The converse would not make sense, though.
The second parameter can only be an array if the first one is as well.

Trying To Make The Code Platform Independent - Reading SQL File From PHP

Hello everyone and here's the code which reads .sql file and stores each query in an array element. In my .sql file there are various statement like CREATE TABLE, DROP, INSERT
$fileLines = file('alltables.sql');
$templine = '';
foreach ($fileLines as $line)
{
// Skip it if it's a comment
if (substr($line, 0, 2) == '--' || $line == '')
continue;
// Add this line to the current segment
$templine .= $line;
// If it has a semicolon at the end, it's the end of the query
if (substr(trim($line), -1, 1) == ';')
{
// Perform the query
$queries[] = $templine;
// Reset temp variable to empty
$templine = '';
}
}
// Here I further process $queries var
It works fine under windows platform but I'm not sure that whether it will work on linux server or not so I want you please look at the code and let me know should I need to alter the code something like (\t\r\0 \x0B) to handle new line and carriage return for different platforms :
$tmp=str_replace("\r\n", "\n", $line);
$tmp=str_replace("\r", "\n", $line);
It's not safe to do it your way, since there may also be \r or \n in a SQL statement (e.g. a long TEXT with many lines).
If you're sure there aren't case like that, I'd suggest you to use trim() instead of str_replace(). It'll remove all spaces and EOL at the end of each lines.
<?php
$tmp = trim($line);
?>
You may also specify to ONLY REMOVE EOL (both "\r\n" and "\n") like this:
<?php
$tmp = trim($line, "\r\n");
?>
Try to use existing SQL parser library.
For example:
http://code.google.com/p/php-sql-parser/

Escaping for CSV

I need to store a string in a MySQL database. The values will later be used in a CSV. How do I escape the string so that it is CSV-safe? I assume I need to escape the following: comma, single quote, double quote.
PHP's addslashes function does:
single quote ('), double quote ("), backslash () and NUL (the NULL byte).
So that won't work. Suggestions? I'd rather not try to create some sort of regex solution.
Also, I need to be able to unescape.
Use fputcsv() to write, and fgetcsv() to read.
fputcsv() is not always necessary especially if you don't need to write any file but you want to return the CSV as an HTTP response.
All you need to do is to double quote each value and to escape double quote characters repeating a double quote each time you find one.
Here a few examples:
hello -> "hello"
this is my "quote" -> "this is my ""quote"""
catch 'em all -> "catch 'em all"
As you can see the single quote character doesn't need any escaping.
Follows a full working example:
<?php
$arrayToCsvLine = function(array $values) {
$line = '';
$values = array_map(function ($v) {
return '"' . str_replace('"', '""', $v) . '"';
}, $values);
$line .= implode(',', $values);
return $line;
};
$csv = [];
$csv[] = $arrayToCsvLine(["hello", 'this is my "quote"', "catch 'em all"]);
$csv[] = $arrayToCsvLine(["hello", 'this is my "quote"', "catch 'em all"]);
$csv[] = $arrayToCsvLine(["hello", 'this is my "quote"', "catch 'em all"]);
$csv = implode("\r\n", $csv);
If you get an error is just because you're using an old version of PHP. Fix it by declaring the arrays with their old syntax and replacing the lambda function with a classic one.
For those of you trying to sanitise data using PHP and output as a CSV this can be done using PHP's fputcsv() function without having to write to a file as such:
<?php
// An example PHP array holding data to be put into CSV format
$data = [];
$data[] = ['row1_val1', 'row1_val2', 'row1_val3'];
$data[] = ['row2_val1', 'row2_val2', 'row2_val3'];
// Write to memory (unless buffer exceeds 2mb when it will write to /tmp)
$fp = fopen('php://temp', 'w+');
foreach ($data as $fields) {
// Add row to CSV buffer
fputcsv($fp, $fields);
}
rewind($fp); // Set the pointer back to the start
$csv_contents = stream_get_contents($fp); // Fetch the contents of our CSV
fclose($fp); // Close our pointer and free up memory and /tmp space
// Handle/Output your final sanitised CSV contents
echo $csv_contents;
Don't store the data CSV escaped in the database. Escape it when you export to CSV using fputcsv. If you're storing it CSV escaped you're essentially storing garbage for all purposes other than CSV exporting.

Manually move the fgetc file pointer to the next line

Question 1: How can I manually move the fgetc file pointer from its current location to the next line?
I'm reading in data character by character until a specified number of delimiters are counted. Once the delimiter count reaches a certain number, it needs to copy the remainder of the line until a new line (the record delimiter). Then I need to start copying character by character again starting at the next record.
Question 2: Is manually moving the file pointer to the next line the right idea? I would just explode(at "\n") but I have to count the pipe delimiters first because "\n" isn't always the record delimiter.
Here's my code (it puts all the data into the correct record until it reaches the last delimiter '|' in the record. It then puts the rest of the line into the next record because I haven't figured out how to make it correctly look for the '\n' after specified # of | are counted):
$file=fopen("source_data.txt","r") or exit ("File Open Error");
$record_incrementor = 0;
$pipe_counter = 0;
while (!feof($file))
{
$char_buffer = fgetc($file);
$str_buffer[] = $char_buffer;
if($char_buffer == '|')
{
$pipe_counter++;
}
if($pipe_counter == 46) //Maybe Change to 46
{
$database[$record_incrementor] = $str_buffer;
$record_incrementor++;
$str_buffer = NULL;
$pipe_counter = 0;
}
}
Sample Data:
1378|2009-12-13 11:51:45.783000000|"Pro" |"B13F28"||""|1||""|""|""|||False|||""|""|""|""||""||||||2010-12-15 11:51:51.330000000|108||||||""||||||False|""|""|False|""|||False
1379|2009-12-13 12:23:23.327000000|"TLUG"|"TUG"||""|1||""|""|""|||False|||""|""|""|""||""||||||1943-04-19 00:00:00|||||||""||||||False|""|""|False|""|||False
I'd say that doing this via file handling functions is a bit clumsy, when it could be done via regular expression quite easily. Just read the entire file into a string using file_get_contents() and doing a regular expression like /^(([^|]*\|){47}([^\r\n]*))/m with preg_match_all() could find you all the rows (which you can then explode() using | as the delimiter and setting 48 as the limit for number of fields.
Here is a working example function. The function takes the file name, field delimiter and the number of fields per row as the arguments. The function returns 2 dimensional array where first index is the data row number and the second is the field number.
function loadPipeData ($file, $delim = '|', $fieldCount = 48)
{
$contents = file_get_contents($file);
$d = preg_quote($delim, '/');
preg_match_all("/^(([^$d]*$d){" . ($fieldCount - 1) . '}([^\r\n]*))/m', $contents, $match);
$return = array();
foreach ($match[0] as $line)
{
$return[] = explode($delim, $line, $fieldCount);
}
return $return;
}
var_dump(loadPipeData('source_data.txt'));
(Note: this is a solution to the original problem)
You can read to the end of the line like this:
while (!feof($file) && fgetc($file) !== '\n');
As for whether or not fgetc is the right way to do this... your format makes it difficult to use anything else. You can't split on \n, because there may be newlines within a field, and you can't split on |, because the end of the record doesn't have a pipe.
The only other option I can think is to use preg_match_all:
$buffer = file_get_contents('test.txt');
preg_match_all('/((?:[^|]*\|){45}[^\n]*\n)/', $buffer, $matches);
foreach ($matches[0] as $row) {
$fields = explode('|', $row);
}
Answer to the modified question:
To read from the file pointer to the end of the line, you can simply use the file reading function fgets(). It returns everything from the current file pointer position until it reaches the end of the line (and also returns the end of the line character(s)). After the function call, the file reading pointer has been moved to the beginning of the next line.

PHP replacing entire string if it contains integer

My script lists out files in the directory. I am able to use preg_match and regex to find files whose filenames contain integers.
However, this is what I am unable to do: I want an entire string to be omitted if it contains an integer.
Despite trying several methods, I am only able to replace the integer itself and not the entire line. Any help would be appreciated.
if (preg_match('/\d/', $string))
$string = "";
This will turn a string into an empty one if it has any number in it.
According to your description, this should be sth. like:
$files = array();
$dirname = 'C://Temp';
$dh = opendir($dirname) or die();
while( ($fn=readdir($dh)) !== false )
if( !preg_match('/\d+|^\.\.?$/', $fn) )
$files[] = $fn;
closedir($dh);
var_dump($files);
... which reads all file names and stores them (except these with numbers and ../.) in an array '$files', which itself gets displayed at the end of the snipped above. If that doesn't fit your requirement, you should give a more detailed explanation of what you are trying to do
Regards
rbo

Categories