How to detect a delimiter in a string in PHP? - php

I am curious if you have a string how would you detect the delimiter?
We know php can split a string up with explode() which requires a delimiter parameter.
But what about a method to detect the delimiter before sending it to explode function?
Right now I am just outputting the string to the user and they enter the delimiter. That's fine -- but I am looking for the application to pattern recognize for me.
Should I look to regular expressions for this type of pattern recognition in a string?
EDIT: I have failed to initially specify that there is a likely expected set of delimiters. Any delimiter that is probably used in a CSV. So technically anyone could use any character to delimit a CSV file but it is more probable to use one of the following characters: comma, semicolon, vertical bar and a space.
EDIT 2: Here is the workable solution I came up with for a "determined delimiter".
$get_images = "86236058.jpg 86236134.jpg 86236134.jpg";
//Detection of delimiter of image filenames.
$probable_delimiters = array(",", " ", "|", ";");
$delimiter_count_array = array();
foreach ($probable_delimiters as $probable_delimiter) {
$probable_delimiter_count = substr_count($get_images, $probable_delimiter);
$delimiter_count_array[$probable_delimiter] = $probable_delimiter_count;
}
$max_value = max($delimiter_count_array);
$determined_delimiter_array = array_keys($delimiter_count_array, max($delimiter_count_array));
while( $element = each( $determined_delimiter_array ) ){
$determined_delimiter_count = $element['key'];
$determined_delimiter = $element['value'];
}
$images = explode("{$determined_delimiter}", $get_images);

Determine which delimiters you consider probable (like ,, ; and |) and for each search how often they occur in the string (substr_count). Then choose the one with most occurrences as the delimiter and explode.
Even though that might not be fail-safe it should work in most cases ;)

I would say this works 99.99% of the cases :)
The basic idea is, that number of valid delimiters should be the same line by line.
This script calculates delimiter count discrepancies between all lines.
Less discrepancy means more likely valid delimiter.
Putting it all together this function read rows and return it back as an array:
function readCSV($fileName)
{
//detect these delimeters
$delA = array(";", ",", "|", "\t");
$linesA = array();
$resultA = array();
$maxLines = 20; //maximum lines to parse for detection, this can be higher for more precision
$lines = count(file($fileName));
if ($lines < $maxLines) {//if lines are less than the given maximum
$maxLines = $lines;
}
//load lines
foreach ($delA as $key => $del) {
$rowNum = 0;
if (($handle = fopen($fileName, "r")) !== false) {
$linesA[$key] = array();
while ((($data = fgetcsv($handle, 1000, $del)) !== false) && ($rowNum < $maxLines)) {
$linesA[$key][] = count($data);
$rowNum++;
}
fclose($handle);
}
}
//count rows delimiter number discrepancy from each other
foreach ($delA as $key => $del) {
echo 'try for key=' . $key . ' delimeter=' . $del;
$discr = 0;
foreach ($linesA[$key] as $actNum) {
if ($actNum == 1) {
$resultA[$key] = 65535; //there is only one column with this delimeter in this line, so this is not our delimiter, set this discrepancy to high
break;
}
foreach ($linesA[$key] as $actNum2) {
$discr += abs($actNum - $actNum2);
}
//if its the real delimeter this result should the nearest to 0
//because in the ideal (errorless) case all lines have same column number
$resultA[$key] = $discr;
}
}
var_dump($resultA);
//select the discrepancy nearest to 0, this would be our delimiter
$delRes = 65535;
foreach ($resultA as $key => $res) {
if ($res < $delRes) {
$delRes = $res;
$delKey = $key;
}
}
$delimeter = $delA[$delKey];
echo '$delimeter=' . $delimeter;
//get rows
$row = 0;
$rowsA = array();
if (($handle = fopen($fileName, "r")) !== false) {
while (($data = fgetcsv($handle, 1000, $delimeter)) !== false) {
$rowsA[$row] = Array();
$num = count($data);
for ($c = 0; $c < $num; $c++) {
$rowsA[$row][] = trim($data[$c]);
}
$row++;
}
fclose($handle);
}
return $rowsA;
}

I have the same problem, I am dealing with a lot of CSV's from various databases, which various people extract to CSV in various ways, sometimes different each time for the same dataset ... Have simply implemented a function like this in my convert base class
protected function detectDelimiter() {
$handle = #fopen($this->CSVFile, "r");
if ($handle) {
$line=fgets($handle, 4096);
fclose($handle);
$test=explode(',', $line);
if (count($test)>1) return ',';
$test=explode(';', $line);
if (count($test)>1) return ';';
//.. and so on
}
//return default delimiter
return $this->delimiter;
}

I made something like this:
$line = fgetcsv($handle, 1000, "|");
if (isset($line[1]))
{
echo "delimiter is: |";
$delimiter="|";
}
else
{
$line1 = fgetcsv($handle, 1000, ";");
if (isset($line1[1]))
{
echo "delimiter is: ;";
$delimiter=";";
}
else
{
echo "delimiter is: ,";
$delimiter=",";
}
}
This simply checks whether there is a second column after a line is read.

I am having the same issue. My system will recieve CSV files from the client but it could use ";", "," or " " as delimiter and I wnat to improve the system so the client dont have to know which is (They never do).
I search and found this library:
https://github.com/parsecsv/parsecsv-for-php
Very good and easy to use.

Related

Replace values of a csv file

I need to find and replace all the values of rows of a CSV using PHP;
I am trying this but its replacing even the headers to 0 and row values are not doubling as it suppose to.
public function checkForNumericValues()
{
// Read the columns and detect numeric values
if (($this->handle = fopen($this->csvFile, "r")) !== FALSE)
{
$fhandle = fopen($this->csvFile,"r");
$content = fread($fhandle,filesize($this->csvFile));
while (($this->data = fgetcsv($this->handle, 1000, ",")) !== FALSE)
{
$this->num = count($this->data);
// Skipping the header
if($this->row == 1)
{
$this->row++;
continue;
}
$this->row++;
// Check and replace the numeric values
for ($j=0; $j < $this->num; $j++)
{
if(is_numeric($this->data[$j]))
{
$content = str_replace($this->data[$j], $this->data[$j] * 2, $content);
}
else
{
$content = str_replace($this->data[$j], 0, $content);
}
}
break;
// print_r($content);
}
$fhandle = fopen($this->csvFile,"w");
fwrite($fhandle,$content);
fclose($this->handle);
}
echo "Numeric and String values been changed in rows of the CSV!";
}
CSV is like this:
You shouldn't update the entire $contents when you're processing each field in the CSV, just update that field. Your str_replace() will replace substrings elsewhere in the file; for instance, if the current field contains 5, you'll replace all the 5's in the file with 10, so 125 will become 1210.
You can do it correctly by replacing the element in the $this->data array. After you do that, you can then join them back into a string with implode(). Then you can keep all the updated lines in a string, which you write back to the file at the end.
You can skip the header line by calling fgets() before the while loop.
public function checkForNumericValues()
{
// Read the columns and detect numeric values
if (($this->handle = fopen($this->csvFile, "r")) !== FALSE)
{
$output = "";
$output .= fgets($this->csvFile); // Copy header line to output
while (($this->data = fgetcsv($this->handle, 1000, ",")) !== FALSE)
{
$this->num = count($this->data);
// Check and replace the numeric values
for ($j=0; $j < $this->num; $j++)
{
if(is_numeric($this->data[$j]))
{
$this->data[$j] *= 2;
}
else
{
$this->data[$j] = 0;
}
}
$output .= implode(',', $this->data) . "\n";
}
fclose($this->handle);
$fhandle = fopen($this->csvFile,"w");
fwrite($fhandle,$output);
fclose($fhandle);
}
echo "Numeric and String values been changed in rows of the CSV!";
}

How to print an exact element (given column and row) of a csv file in PHP

I'm not sure how to print an exact element (like column 5, row 3) of a csv file in PHP. I have a CSV file with 3 columns: ID, cost, location. I need to search for the ID, which I can do and I can even return what row number it is. But then how can I print off that row's column 3? The code below prints the line number where $interior can be found.
$lines = file('database.csv');
$line_number = false;
while (list($key, $line) = each($lines) and !$line_number) {
$line_number = (strpos($line, $interior) !== FALSE);
}
if($line_number){
$search = $interior;
$line_number = false;
if ($handle = fopen("database.csv", "r")) {
$count = 0;
while (($line = fgets($handle, 4096)) !== FALSE and !$line_number) {
$count++;
$line_number = (strpos($line, $search) !== FALSE) ? $count : $line_number;
}
fclose($handle);
}
echo $line_number;
If
$lines = file('database.csv');
Gave you the lines in an array then:
$line = explode(",", $lines[2]);
Will give you an array of each elemet of row 3 (note the two (+1) in the lines variable).
So...
Echo $line[4];
Will be the third row and fifth column of database.csv
Since this question lacks a complete (non-breaking) answer:
You can simply use str_getcsv on each line of your csv and store the results in an array:
$lines = file('database.csv');
$data = array();
foreach($lines as $line)
{
// if your CSV uses a different delimiter or you enclose your fields with a different character than " alter the following line according to the php docs of str_getcsv
$data[] = str_getcsv($line);
}
// get row 3, column 5:
echo $data[2][4];
You can find the position of all comma(as it is CSV file) then on the basis of strpos($line, $search) this function's return you can decide the column.But if any of your column will contain comma, this logic will fail.
For that case, search the positions of ", (assuming your column will quoted by double quote).
You can put it inside your while loop:
$searchStr = "CA" ;
$mystr = "5,50.00,CA";
$myArr = explode(",",$mystr);
foreach($myArr as $k=>$v)
{
if($v == $searchStr)
echo "Column :". $k;
}

PHP - Find a string in file then show it's line number

I have an application which needs to open the file, then find string in it, and print a line number where is string found.
For example, file example.txt contains few hashes:
APLF2J51 1a79a4d60de6718e8e5b326e338ae533 EEQJE2YX
66b375b08fc869632935c9e6a9c7f8da O87IGF8R
c458fb5edb84c54f4dc42804622aa0c5 APLF2J51 B7TSW1ZE
1e9eea56686511e9052e6578b56ae018 EEQJE2YX
affb23b07576b88d1e9fea50719fb3b7
So, I want to PHP search for "1e9eea56686511e9052e6578b56ae018" and print out its line number, in this case 4.
Please note that there are will not be multiple hashes in file.
I found a few codes over Internet, but none seem to work.
I tried this one:
<?PHP
$string = "1e9eea56686511e9052e6578b56ae018";
$data = file_get_contents("example.txt");
$data = explode("\n", $data);
for ($line = 0; $line < count($data); $line++) {
if (strpos($data[$line], $string) >= 0) {
die("String $string found at line number: $line");
}
}
?>
It just says that string is found at line 0.... Which is not correct....
Final application is much more complex than that...
After it founds line number, it should replace string which something else, and save changes to file, then goes further processing....
Thanks in advance :)
An ultra-basic solution could be:
$search = "1e9eea56686511e9052e6578b56ae018";
$lines = file('example.txt');
$line_number = false;
while (list($key, $line) = each($lines) and !$line_number) {
$line_number = (strpos($line, $search) !== FALSE) ? $key + 1 : $line_number;
}
echo $line_number;
A memory-saver version, for larger files:
$search = "1e9eea56686511e9052e6578b56ae018";
$line_number = false;
if ($handle = fopen("example.txt", "r")) {
$count = 0;
while (($line = fgets($handle, 4096)) !== FALSE and !$line_number) {
$count++;
$line_number = (strpos($line, $search) !== FALSE) ? $count : $line_number;
}
fclose($handle);
}
echo $line_number;
function get_line_from_hashes($file, $find){
$file_content = file_get_contents($file);
$lines = explode("\n", $file_content);
foreach($lines as $num => $line){
$pos = strpos($line, $find);
if($pos !== false)
return $num + 1
}
return false
}
get_line_from_hashes("arquivo.txt", "asdsadas2e3xe3ceQ#E"); //return some number or false case not found.
If you need fast and universal solution that working also for finding line number of multiline text in file, use this:
$file_content = file_get_contents('example.txt');
$content_before_string = strstr($file_content, $string, true);
if (false !== $content_before_string) {
$line = count(explode(PHP_EOL, $content_before_string));
die("String $string found at line number: $line");
}
FYI Works only with PHP 5.3.0+.
$pattern = '/1e9eea56686511e9052e6578b56ae018/';
if (preg_match($pattern, $content, $matches, PREG_OFFSET_CAPTURE)) {
//PREG_OFFSET_CAPTURE will add offset of the found string to the array of matches
//now get a substring of the offset length and explode it by \n
$lineNumber = count(explode("\n", substr($content, 0, $matches[0][1])));
}
If the file is not extremely large then just read the file into an array file, search for the word preg_grep, get the index key for that line and add 1 since the array starts at 0:
$string = "1e9eea56686511e9052e6578b56ae018";
echo key(preg_grep("/$string/", file("example.txt"))) + 1;
I found this to work great and be very efficient; Simply explode the file by each line and search through the array for your search terms like so:
function getLineNum($haystack, $needle){
# Our Count
$c = 1;
# Turn our file contents/haystack into an array
$hsarr = explode("\n", $haystack);
# Iterate through each value in the array as $str
foreach($hsarr as $str){
# If the current line contains our needle/hash we are looking for it
# returns the current count.
if(strstr($str, $needle)) return $c;
# If not, Keep adding one for every new line.
$c++;
}
# If nothing is found
if($c >= count($hsarr)) return 'No hash found!';
}
EDIT: Looking through the other answers, I realize that Guilherme Soares had a similar approach but used strpos, which in this case doesnt work. So I made a few alterations with his idea in mind here:
function getLineNum($haystack, $needle){
$hsarr = explode(PHP_EOL, $haystack);
foreach($hsarr as $num => $str) if(strstr($str, $needle)) return $num + 1;
return 'No hash found!';
}
Live Demo: https://ideone.com/J4ftV3

str_getcsv() alternative for older PHP version, gives me an empty array at the end

My hosting provider doesn't have a version of PHP that supports str_getcsv() so I looked around and found this function. It does the trick except that it gives me an extra empty array and it's messing up my code. Example "a, b, b" would return Array ( [0] => a [1] => b [2] => c [3] => ). Here's the function:
function _pick_csv_element($x) {
return strlen($x[1]) ? $x[1] : $x[2];
}
function str_getcsv($input) {
preg_match_all(
'/\G (?: \s*"([^"]*)"\s* | ([^,]*) ) (?:,|$) /x',
$input, $matches,
PREG_SET_ORDER
);
return array_map('_pick_csv_element', $matches);
}
Probably the most reliable workaround is this:
$fh = fopen('php://temp', 'r+');
fwrite($fh, $string);
rewind($fh);
$row = fgetcsv($fh);
fclose($fh);
You keep using the built-in CSV function, you just need to make it read from a stream. This has a slight performance hit though, since the string needs to be copied.
Dave,
I guess you're using PHP 5.2.*. If your requirement is to read an entire CSV file into a multi-dimensional associative array, below would work like a beauty:
function parse_csv_file($csvfile) {
$csv = Array();
$rowcount = 0;
if (($handle = fopen($csvfile, "r")) !== FALSE) {
$max_line_length = defined('MAX_LINE_LENGTH') ? MAX_LINE_LENGTH : 10000;
$header = fgetcsv($handle, $max_line_length);
$header_colcount = count($header);
while (($row = fgetcsv($handle, $max_line_length)) !== FALSE) {
$row_colcount = count($row);
if ($row_colcount == $header_colcount) {
$entry = array_combine($header, $row);
$csv[] = $entry;
}
else {
error_log("csvreader: Invalid number of columns at line " . ($rowcount + 2) . " (row " . ($rowcount + 1) . "). Expected=$header_colcount Got=$row_colcount");
return null;
}
$rowcount++;
}
//echo "Totally $rowcount rows found\n";
fclose($handle);
}
else {
error_log("csvreader: Could not read CSV \"$csvfile\"");
return null;
}
return $csv;
}
Look here for your answer this may help you lot.
alternate for str_getcsv

PHP loop over csv - start and end loop at Regex pattern

I need to print out csv file into html or put a numeric data into database:
But I need to start a loop at specific position and break it at another specific position (regex).
So I need to reprint only rows with numerical data and all columns from them.
Following is pseudo-code - not working properly:
<?php
$row = 1;
$handle = fopen("test.csv", "r");
while ($data = fgetcsv($handle, 1000, ","))
{
if (preg_match('/[Morning]/', $data[0]) === 1 // start at this rwo plus two lines down )
{
$num = count($data);
$row++;
for ($c=0; $c < $num; $c++)
{
for ($c=0; $c < $num; $c++)
{
echo $data[$c] . " ";
}
if (preg_match('/[Total Cash:]/', $data[0]) === 1)
{ break; row -1 }
}
echo "<br>";
}
}
fclose($handle); ?>
So csv goes like this:
/--some lines--/
Date: 3/3/11,
Morning,
--blank line---
Customer No,Time,CheckNo,Total,
1234,12-45,01,20.00,
1236,1-00,03,30.00,
1240,2-00,06,30.00,
--more numerical rows of data at variable length that I need to loop over--
1500,4-00,07,22.00,
----,----,---,----,
Total Cash, , , ,120.00,
/--some other lines--and it goes/
Lunch Time,
---similar like Morning above ---
Any info how to properly addrres this issue is appreciated, I can now do so many loops and regex but with this I need some more time and help. Thanks.
$lines = file('test.csv'); //read file into an array, one entry per line
$active = false; //keep track of what rows to parse
//loop one line at a time
for ($i = 0; $i < count($lines); $i++) {
$line = $lines[$i];
if (strpos($line, 'Morning') !== false) { //start parsing on the next row
$active = true;
$i += 2; //skip the blank line and header
continue;
}
if (strpos($line, '----,') !== false) { //stop parsing rows
$active = false;
}
if ($active) { //if parsing enabled, split the line on commas and do something with the values
$values = str_getcsv(trim($line));
foreach ($values as $value) {
echo $value . " "; //these are the numbers
}
}
}
$lines = file('test.csv');
$parsing = false;
foreach ($lines as $line)
{
$parsing = ((strpos($line, 'Morning') !== false) || $parsing)
&& ((strpos($line, 'Total Cash') === false);
if (!$parsing)
continue;
$values = strgetcsv($line);
echo implode(' ', $values);
}
Edit: Basically, it does the same as Dan Grossmans solution, but shorter ;-)
$lines = file('test.csv');
// Skip the unwanted lines
// Means: Every line until the line containing "Morning,"
do {
$line = array_shift($lines);
} while(trim($line) !== 'Morning,');
$lines = array_slice($lines, 2); // Mentioned something about "2 lines below" or such" ^^
// Do something with the remaining lines, until
// Line _begins_ with "Total Cash"
while(strncmp('Total Cash', trim($line = array_shift($lines)), 10) !== 0) {
echo implode(' ', str_getcsv($line)) . PHP_EOL;
}

Categories