I've been using all sorts of hacks to generate file indexes out of SMB shares. And it's all cool with basic filepath plus metadata indexing.
The next step I want to implement is an algorithm combining some unix-like utilities and php, to index specific context from within files.
Now the first step in this context generation is something like this
while read p; do egrep -rH '^;|\(|^\(|\)$' "$p"; done <textual.txt > text_context_search.txt
This is specific regexing for my purpose for indexing contents of programs, this extracts lines that are whole comments or contains comments out of CNC program files.
resulting output is something like
file_path:regex_hit
now obviously most programs has more than one comment, so theres too much redundancy not only in repetition, but an exhaustive context index is about a gigabyte in size
I am now working towards script that would compact redudancy in such pattern
file_path_1:regex_hit_1
file_path_1:regex_hit_2
file_path_1:regex_hit_3
...
would become:
file_path_1:regex_hit1,regex_hit_2,regex_hit3
and if I succeed to do this in efficient manner its all ok.
The problem here is whether I'm doing this in a proper way. Maybe I should be using different tools to generate such context index in the first place ?
EDIT
After further copying and pasting from stack overflow and thinking about it I glued up solution using not my code, that nearly entirely solves my previously mentioned issue.
<?php
// https://stackoverflow.com/questions/26238299/merging-csv-lines-where-column-value-is-the-same
$rows = array_map('str_getcsv', file('text_context_search2.1.txt'));
//echo '<pre>';
print_r($csv);
//echo '</pre>';
// Array for output
$concatenated = array();
// Key to organize over
$sortKey = '0';
// Key to concatenate
$concatenateKey = '1';
// Separator string
$separator = ' ';
foreach($rows as $row) {
// Guard against invalid rows
if (!isset($row[$sortKey]) || !isset($row[$concatenateKey])) {
continue;
}
// Current identifier
$identifier = $row[$sortKey];
if (!isset($concatenated[$identifier])) {
// If no matching row has been found yet, create a new item in the
// concatenated output array
$concatenated[$identifier] = $row;
} else {
// An array has already been set, append the concatenate value
$concatenated[$identifier][$concatenateKey] .= $separator . $row[$concatenateKey];
}
}
// Do something useful with the output
//var_dump($concatenated);
//echo json_encode($concatenated)."\n";
$fp = fopen('exemplar.csv', 'w');
foreach ($concatenated as $fields) {
fputcsv($fp, $fields);
}
fclose($fp);
My text file sample.txt. I want to exclude the first row from the text file and store the other rows into mysql database.
ID Name EMail
1 Siva xyz#gmail.com
2 vinoth xxx#gmail.com
3 ashwin yyy#gmail.com
Now I want to read this data from the text file except the first row(ID,name,email) and store into the MYsql db.Because already I have created a filed in database with the same name.
I have tried
$handle = #fopen($filename, "r"); //read line one by one
while (!feof($handle)) // Loop till end of file.
{
$buffer = fgets($handle, 4096); // Read a line.
}
print_r($buffer); // It shows all the text.
Please let me know how to do this?
Thanks.
Regards,
Siva R
It's easier if you use file() since it will get all rows in an array instead:
// Get all rows in an array (and tell file not to include the trailing new lines
$rows = file($filename, FILE_IGNORE_NEW_LINES);
// Remove the first element (first row) from the array
array_shift($rows);
// Now do what you want with the rest
foreach ($rows as $lineNumber => $row) {
// do something cool with the row data
}
If you want to get it all as a string again, without the first row, just implode it with a new line as glue:
// The rows still contain the line break, since we only trimmed the copy
$content = implode("\n", $rows);
Note: As #Don'tPanic pointed out in his comment, using file() is simple and easy but not advisable if the original file is large, since it will read the whole thing into memory as an array (and arrays take more memory than strings). He also correctly recommended the FILE_IGNORE_NEW_LINES-flag, just so you know :-)
You can just call fgets once before your while loop to get the header row out of the way.
$firstline = fgets($handle, 4096);
while (!feof($handle)) // Loop till end of file.
{ ...
I have a huge issue, I cant find any way to sort array entries. My code:
<?php
error_reporting(0);
$lines=array();
$fp=fopen('file.txt, 'r');
$i=0;
while (!feof($fp))
{
$line=fgets($fp);
$line=trim($line);
$lines[]=$line;
$oneline = explode("|", $line);
if($i>30){
$fz=fopen('users.txt', 'r');
while (!feof($fz))
{
$linez=fgets($fz);
$linez=trim($linez);
$lineza[]=$linez;
$onematch = explode(",", $linez);
if (strpos($oneline[1], $onematch[1])){
echo $onematch[0],$oneline[4],'<br>';
}
else{
}
rewind($onematch);
}
}
$i++;
}
fclose($fp);
?>
The thing is, I want to sort items that are being echo'ed by $oneline[4]. I tried several other posts from stackoverflow - But was not been able to find a solution.
The anser to your question is that in order to sort $oneline[4], which seems to contain a string value, you need to apply the following steps:
split the string into an array ($oneline[4] = explode(',',
$oneline[4]))
sort the resulting array (sort($oneline[4]))
combine the array into a string ($oneline[4] = implode(',',
$oneline[4]))
As I got the impression variable naming is low on the list of priorities I'm re-using the $oneline[4] variable. Mostly to clarify which part of the code I am referring to.
That being said, there are other improvements you should be making, if you want to be on speaking terms with your future self (in case you need to work on this code in a couple of months)
Choose a single coding style and stick to it, the original code looked like it was copy/pasted from at least 4 different sources (mostly inconsistent quote-marks and curly braces)
Try to limit repeating costly operations, such as opening files whenever you can (to be fair, the agents.data could contain 31 lines and the users.txt would be opened only once resulting in me looking like a fool)
I have updated your code sample to try to show what I mean by the points above.
<?php
error_reporting(0);
$lines = array();
$users = false;
$fp = fopen('http://20.19.202.221/exports/agents.data', 'r');
while ($fp && !feof($fp)) {
$line = trim(fgets($fp));
$lines[] = $line;
$oneline = explode('|', $line);
// if we have $users (starts as false, is turned into an array
// inside this if-block) or if we have collected 30 or more
// lines (this condition is only checked while $users = false)
if ($users || count($lines) > 30) {
// your code sample implies the users.txt to be small enough
// to process several times consider using some form of
// caching like this
if (!$users) {
// always initialize what you intend to use
$users = [];
$fz = fopen('users.txt', 'r');
while ($fz && !feof($fz)) {
$users[] = explode(',', trim(fgets($fz)));
}
// always close whatever you open.
fclose($fz);
}
// walk through $users, which contains the exploded contents
// of each line in users.txt
foreach ($users as $onematch) {
if (strpos($oneline[1], $onematch[1])) {
// now, the actual question: how to sort $oneline[4]
// as the requested example was not available at the
// time of writing, I assume
// it to be a string like: 'b,d,c,a'
// first, explode it into an array
$oneline[4] = explode(',', $oneline[4]);
// now sort it using the sort function of your liking
sort($oneline[4]);
// and implode the sorted array back into a string
$oneline[4] = implode(',', $oneline[4]);
echo $onematch[0], $oneline[4], '<br>';
}
}
}
}
fclose($fp);
I hope this doesn't offend you too much, just trying to help and not just providing the solution to the question at hand.
I am building a small application that does some simple reporting based on CSV files, the CSV files are in the following format:
DATE+TIME,CLIENTNAME1,HAS REQUEST BLABLA1,UNIQUE ID
DATE+TIME,CLIENTNAME2,HAS REQUEST BLABLA2,UNIQUE ID
DATE+TIME,CLIENTNAME1,HAS REQUEST BLABLA1,UNIQUE ID
DATE+TIME,CLIENTNAME2,HAS REQUEST BLABLA2,UNIQUE ID
Now I am processing this using the following function:
function GetClientNames(){
$file = "backend/AllAlarms.csv";
$lines = file($file);
arsort($lines);
foreach ($lines as $line_num => $line) {
$line_as_array = explode(",", $line);
echo '<li><i class="icon-pencil"></i>' . $line_as_array[1] . '</li>';
}
}
I am trying to retrieve only the Clientname values, but I only want the unique values.
I have tried to create several different manners of approaching this, I understand I need to use the unique_array function, but I have no clue on exactly how to use this function.
I've tried this:
function GetClientNames(){
$file = "backend/AllAlarms.csv";
$lines = file($file);
arsort($lines);
foreach ($lines as $line_num => $line) {
$line_as_array = explode(",", $line);
$line_as_array[1] = unique_array($line_as_array[1]);
echo '<li><i class="icon-pencil"></i>' . $line_as_array[1] . '</li>';
}
}
But this gives me a very very dirty result with 100's of spaces instead of the correct data.
I would recommend you to use the fgetcsv() function when reading in csv files. In the wild csv files can be quite complicated handle by naive explode() approach:
// this array will hold the results
$unique_ids = array();
// open the csv file for reading
$fd = fopen('t.csv', 'r');
// read the rows of the csv file, every row returned as an array
while ($row = fgetcsv($fd)) {
// change the 3 to the column you want
// using the keys of arrays to make final values unique since php
// arrays cant contain duplicate keys
$unique_ids[$row[3]] = true;
}
var_dump(array_keys($unique_ids));
You can also collect values and use array_unique() on them later. You probably want to split the "reading in" and the "writing out" part of your code too.
Try using array_unique()
Docs:
http://php.net/manual/en/function.array-unique.php
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a set of files I am trying to import into MySQL.
Each CSV file looks like this:
Header1;Header2;Header3;Header4;Header5
Data1;Data2;Data3;Data4;Data5;
Data1;Data2;Data3;Data4;Data5;
Data1;Data2;Data3;Data4;Data5;
Data1;Data2;Data3;Data4;Data5;
Data may contain spaces, periods or a full colon. They absolutely will not contain a semi-colon so that is a valid delimiter. They also will not contain \n or any other newline characters.
Example Data
2010.08.30 18:34:59
0.7508
String of characters with spaces in them
Each file has a unique name to it. The names all conform to the following pattern:
Token1_Token2_Token3.csv
I am interested in combining a lot of these CSV files (on the order of several hundred) into one CSV file. Files can range from 10KB to 400MB. Ultimately, I want to send it over to MySQL. Don't worry about getting rid of the individual header rows; I can do that in MySQL easily.
I would like the final CSV file to look like this:
Header1,Header2,Header3,Header4,Header5,FileName
Data1,Data2,Data3,Data4,Data5,Token1
Data1,Data2,Data3,Data4,Data5,Token1
Data1,Data2,Data3,Data4,Data5,Token1
Data1,Data2,Data3,Data4,Data5,Token1
Data1,Data2,Data3,Data4,Data5,Token1
I don't care about any of the other tokens. I can also live if the solution just dumps each csv filename into the Token1 field because, again, I can parse that in MySQL easily.
Please help me! I've spent over 10 hours on what should be a relatively easy problem.
Technologies available:
awk
windows batch
linux bash
powershell
perl
python
php
mysql-import
This is a server box so I won't be able to compile anything but if you give me a Java solution I will definitely try to run it on the box.
Using Text::CSV:
Program
#!/usr/bin/env perl
use strict;
use warnings;
use File::Find;
use Text::CSV;
my $semi_colon_csv = Text::CSV->new( { 'sep_char' => ';', } );
my $comma_csv = Text::CSV->new( {
'sep_char' => ',',
'eol' => "\n",
} );
open my $fh_output, '>', 'output.csv' or die $!;
sub convert {
my $file_name = shift;
open my $fh_input, '<', $file_name or die $!;
# header
my $row = $semi_colon_csv->getline($fh_input);
$comma_csv->print( $fh_output, [ #$row, $file_name ] );
while ( $row = $semi_colon_csv->getline($fh_input) ) {
pop #$row unless $row->[-1]; # remove trailing semi-colon from input
my ($token) = ( $file_name =~ /^([^_]+)/ );
$comma_csv->print( $fh_output, [ #$row, $token ] );
}
}
sub wanted {
return unless -f;
convert($_);
}
my $path = 'csv'; # assuming that all your CSVs are in ./csv/
find( \&wanted, $path );
Output (output.csv)
Header1,Header2,Header3,Header4,Header5,Token1_Token2_Token3.csv
Data1,Data2,Data3,Data4,Data5,Token1
Data1,Data2,Data3,Data4,Data5,Token1
Data1,Data2,Data3,Data4,Data5,Token1
Data1,Data2,Data3,Data4,Data5,Token1
Believe it or not, it may be as simple as:
awk 'BEGIN{OFS = FS = ";"} {print $0, FILENAME}' *.csv > newfile.csv
If you want to change the field separator from semicolons to commas:
awk 'BEGIN{OFS = ","; FS = ";"} {$1 = $1; print $0, FILENAME}' *.csv > newfile.csv
To include only the first token:
awk 'BEGIN{OFS = ","; FS = ";"} {$1 = $1; split(FILENAME, a, "_"); print $0, a[1]}' *.csv > newfile.csv
You might want to try this quick & dirty Perl hack to convert the data:
#!/usr/bin/perl
use strict;
use warnings;
# Open input file
my $inputfile = shift or die("Usage: $0 <filename>\n\n");
open F, $inputfile or die("Could not open input file ($!)\n\n");
# Split filename into an array
my #tokens = split("_", $inputfile);
my $isFirstline = 1;
# Iterate each line in the file
foreach my $line (<F>) {
my $addition;
chomp($line); # Remove newline
# Add the complete filename to the line at first line
if ($isFirstline) {
$isFirstline = 0;
$addition = ",$inputfile";
} else { # Add first token for the rest of the lines
$addition = ",$tokens[0]";
}
# Split the data into #elements array
my #elements = split(";", $line);
# Join it using comma and add filename/token & a new line
print join(",", #elements) . $addition . "\n";
}
close(F);
Perl's DBI module can cope with CSV files (DBD::CSV module required) and MySQL. Just put all your csv files in the same dir, and query them like this:
use DBI;
my $DBH = DBI->connect ("dbi:CSV:", "", "", { f_dir => "$DATABASEDIR", f_ext => ".csv", csv_sep_char => ";",});
my $sth = $dbh->prepare ("SELECT * FROM Token1_Token2_Token3");
$sth->execute;
while (my $hr = $sth->fetchrow_hashref) {
[...]
}
$sth->finish ();
Yo can query csv files (including JOIN statements!) and insert data directly into MySQL.
This is one way to do it in PowerShell:
$res = 'result.csv'
'Header1,Header2,Header3,Header4,Header5,FileName' > $res
foreach ($file in dir *.csv)
{
if ($file -notmatch '(\w+)_\w+_\w+\.csv') { continue }
$csv = Import-Csv $file -Delimiter ';'
$csv | Foreach {"{0},{1},{2},{3},{4},{5}" -f `
$_.Header1,$_.Header2,$_.Header3,$_.Header4,$_.Header5,$matches[1]} >> $res
}
If the size of the files weren't so potentially large I would suggest going this route:
$csvAll = #()
foreach ($file in dir *.csv)
{
if ($file -notmatch '(\w+)_\w+_\w+\.csv') { continue }
$csv = Import-Csv $file -Delimiter ';'
$csv | Add-Member NoteProperty FileName $matches[1]
$csvAll += $csv
}
$csvAll | Export-Csv result.csv -NoTypeInformation
However, this holds the complete contents of all CSV files in memory until it is ready to export at the end. Not feasible unless you have 64-bit Windows with lots of memory. :-)