I am writing code that reads in .csv files and creates associative arrays in PHP. I want to organize the array such that each column (in order) before the last column is a level of an associative array. In the non-generalized case:
$data = array();
$file =fopen("data.csv", "r");
while (($line = fgetcsv($file)) !== FALSE) {
$var1 = $line[0];
$var2 = $line[1];
$value = $line[2];
$data[$var1][$var2] = $value;
}
I want to be able to do this regardless of the number of columns there are (ie, var1... varN). It will be organized such that variables (columns) 1-N uniquely identify each row, and the desired value is always the last columns.
Just put another while inside your while considering how much columns are in that line you took from the file. That should be enough.
I was able to do this by building the line of code and using the eval statement:
$str = '$allData["data"]';
for($x=0; $x<$numVars; $x++) {
$var = $line[$x];
if(($x+1) != $numVars) {
$str .= "['$var']";
}
}
$str .= "=";
$str .= $line[$numVars-1];
$str .= ";";
eval($str);
Related
I have two text files of data the first file has 30 lines of data and matches with 30 lines in the second text file, but in addition the first text file has two additional lines that are added as the operator uploads file to the directory I want to find the non matching lines and out put them to be used in the same script as a mailout.
I am trying to use this code, which outputs the contents of the two files to screen.
<?php
if ($file1 = fopen(".data1.txt", "r")) {
while(!feof($file1)) { $textperline = fgets($file1);
echo $textperline;
echo "<br>";}
if ($file2 = fopen(".data.txt", "r")) {
while(!feof($file2)) {$textperline1 = fgets($file2);
echo $textperline1;
echo "<br>";}
fclose($file1);
fclose($file2);
}}
?>
But it outputs the whole list of data, can anyone help listingout only NON matching lines?
attached output of the two files from my code
I want to output only lines that are in file2 but not in file1
My suggestion would be to read each file into an array (one line = one element) and then use array_diff to compare them. Unless you have millions of lines, this approach is the easiest.
To reuse your code, this is how you can read the 2 files into two arrays
$list1 = [];
$list2 = [];
if ($file1 = fopen(".data1.txt", "r")) {
while (!feof($file1)) {
$list1[] = trim(fgets($file1));
}
fclose($file1);
}
if ($file2 = fopen(".data.txt", "r")) {
while (!feof($file2)) {
$list2[] = trim(fgets($file2));
}
fclose($file2);
}
If the files are small and you can read them in one go, you can also use a simplified syntax.
$list1 = explode(PHP_EOL, file_get_contents(".data1.txt"));
$list2 = explode(PHP_EOL, file_get_contents(".data.txt"));
Then, no matter which method you chose, you can compare them as follows
$comparison = array_diff($list2, $list1);
foreach ($comparison as $line) {
echo $line."<br />";
}
This will only output the lines of the second array that are not present in the first one.
Make sure that the one with the additional lines is the first argument of array_diff
ASSUMPTION
Both files are not huge and you can read the whole content into the memory at once. According to this, you can put following code to the top:
$file1 = "./data1.txt";
$file2 = "./data2.txt";
$linesOfFile1 = file($file1);
$linesOfFile2 = file($file2);
$newLinesInFile2 = [];
There are a couple cases, which you did not mention in your question.
CASE 1
New lines are only appended to the secode file file2. The solution for this case is the easiest one:
$numberOfRowsFile1 = count($linesOfFile1);
$numberOfRowsFile2 = count($linesOfFile1);
if($numberOfRowsFile2 > $numberOfRowsFile1)
{
$newLinesInFile2 = array_slice($linesOfFile2, $numberOfRowsFile1);
}
CASE 2
The lines with the same content may have different position in each file. Duplicate lines within the same file are ignored.
Furthermore the case sensitivity may play a role. That's why the content of each line should be hashed to make a simpler comparison. For both case sensitive and insensitive comparison the following function is needed:
function buildHashedMap($array, &$hashedMap, $caseSensitive = true)
{
foreach($array as $line)
{
$line = !$caseSensitive ? strtolower($line) : $line;
$hash = md5($line);
$hashedMap[$hash] = $line;
}
}
Case sensitive comparison
$hashedLinesFile1 = [];
buildHashedMap($linesOfFile1, $hashedLinesFile1);
$hashedLinesFile2 = [];
buildHashedMap($linesOfFile2, $hashedLinesFile2);
$newLinesInFile2 = array_diff_key($hashedLinesFile2, $hashedLinesFile1);
Case INSENSITIVE comparison
$caseSensitive = false;
$hashedLinesFile1 = [];
buildHashedMap($linesOfFile1, $hashedLinesFile1, $caseSensitive);
$hashedLinesFile2 = [];
buildHashedMap($linesOfFile2, $hashedLinesFile2, $caseSensitive);
$newLinesInFile2 = array_diff_key($hashedLinesFile2, $hashedLinesFile1);
i am fairly new to PHP and tried several hours to get something going, sadly without a result. I hope you can point me into the right direction.
So what i got is a CSV file containing Articles. They are separated into diff columns and always the same structure, for example :
ArtNo, ArtName, ColorCode, Color, Size
When an article has different color codes in the CSV, the article is simply repeated with the same information except for the color code, see an example:
ABC237;Fingal Edition;48U;Nautical Blue;S - 5XL;
ABC237;Fingal Edition;540;Navy;S - 5XL;
My problem is, i want to display all the articles in a table, include an article image etc.. so far i got that working which is not a problem, but instead of showing the article twice for every different color code i want to create only one line per ArtNo (First CSV Line) but still read the second duplicate line to add the article color to the first one, like :
ABC237; Fingal Edition ;540;Nautical Blue, Navy;S - 5XL;
Is this even possible or am I going into a complete wrong direction here? My code looks like this
<?php
$csv = readCSV('filename.csv');
foreach ($csv as $c) {
$artNo = $c[0]; $artName = $c[1]; $colorCode = $c[2]; $color = $c[3]; $sizes = $c[4]; $catalogue = $c[5]; $GEP = $c[6]; $UVP = $c[7]; $flyerPrice = $c[8]; $artDesc = $c[9]; $size1 = $c[10]; $size2 = $c[11]; $size3 = $c[12]; $size4 = $c[13]; $size5 = $c[14]; $size6 = $c[15]; $size7 = $c[16]; $size8 = $c[17]; $picture = $c[0] . "-" . $c[2] . "-d.jpg";
// Echo HTML Stuff
}
?>
Read CSV Function
<?php
function readCSV($csvFile){
$file_handle = fopen($csvFile, 'r');
while (!feof($file_handle) )
{
$line_of_text[] = fgetcsv($file_handle, 0, ";");
}
fclose($file_handle);
return $line_of_text;
}
?>
I tried to get along with array_unique etc but couldn't find a proper solution.
Read all the data into an array, using the article number as the key....
while (!feof($file_handle) ) {
$values = fgetcsv($file_handle, 0, ";");
$artno = array_shift($values);
if (!isset($data[$artno])) $data[$artno]=array();
$data[$artno][]=$values;
}
And then output it:
foreach ($data as $artno=>$v) {
$first=each($v);
print $artno . "; " . each($first);
foreach ($v as $i) {
$discard=array_shift($i);
print implode(";", $i);
}
print "\n";
}
(code not tested, YMMV)
You need to know exactly how many items belong to each ArtNo group. This means a loop to group, and another loop to display.
When grouping, I steal the ArtNo from the row of data and use it as the grouping key. The remaining data in the row will be an indexed subarray of that group/ArtNo.
I am going to show you some printf() and sprintf() syntax to keep things clean. printf() will display the first parameter's content and using any subsequent values to replace the placeholders in the string. In this case, the 2nd parameter is a conditional expression. On the first iteration of the group, ($i = 0), we want to show the ArtNo as the first cell of the row and declare the number of rows that it should span. sprinf() is just like printf() except it produces a value (silently). Upon any subsequent iterations of the group, $i will be greater than zero and therefore an empty string is passed as the value.
Next, I'm going to use implode() which is beautifully flexible when you don't know exactly how many columns your table will have (or if the number of columns may change during the lifetime of your project).
Tested Code:
$csv = <<<CSV
ABC237;Fingal Edition;48U;Nautical Blue;S - 5XL
ABC236;Fingal Edition;540;Navy;S - 5XL
ABC237;Fingal Edition;49U;Sea Foam;L - XL
ABC237;Fingal Edition;540;Navy;S - 5XL
CSV;
$lines = explode(PHP_EOL, $csv);
foreach ($lines as $line) {
$row = str_getcsv($line, ';');
$grouped[array_shift($row)][] = $row;
}
echo '<table>';
foreach ($grouped as $artNo => $group) {
foreach ($group as $i => $values) {
printf(
'<tr>%s<td>%s</td></tr>',
(!$i ? sprintf('<td rowspan="%s">%s</td>', count($group), $artNo) : ''),
implode('</td><td>', $values)
);
}
}
echo '</table>';
Output:
I'm importing a CSV that has 3 columns, one of these columns could have duplicate records.
I have 2 things to check:
1. The field 'NAME' is not null and is a string
2. The field 'ID' is unique
So far, I'm parsing the CSV file, once and checking that 1. (NAME is valid), which if it fails, it simply breaks out of the while loop and stops.
I guess the question is, how I'd check that ID is unique?
I have fields like the following:
NAME, ID,
Bob, 1,
Tom, 2,
James, 1,
Terry, 3,
Joe, 4,
This would output something like `Duplicate ID on line 3'
Thanks
P.S this CSV file has more columns and can have around 100,000 records. I have simplified it for a specific reason to solve the duplicate column/field
Thanks
<?php
$cnt = 0;
$arr=array();
if (($handle = fopen("1.csv", "r")) !== FALSE) {
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
$num=count($data);
$cnt++;
for ($c=0; $c < $num; $c++) {
if(is_numeric($data[$c])){
if (array_key_exists($data[$c], $arr))
$arrdup[] = "duplicate value at ".($cnt-1);
else
$arr[$data[$c]] = $data[$c-1];
}
}
}
fclose($handle);
}
print_r($arrdup);
Give it a try:
$row = 1;
$totalIDs = array();
if (($handle = fopen('/tmp/test1.csv', "r")) !== FALSE)
{
while (($data = fgetcsv($handle)) !== FALSE)
{
$name = '';
if (isset($data[0]) && $data[0] != '')
{
$name = $data[0];
if (is_numeric($data[0]) || !is_string($data[0]))
echo "Name is not a string for row $row\n";
}
else
{
echo "Name not set for row $row\n";
}
$id = '';
if (isset($data[1]))
{
$id = $data[1];
}
else
{
echo "ID not set for row $row\n";
}
if (isset($totalIDs[$id])) {
echo "Duplicate ID on line $row\n";
}
else {
$totalIDs[$id] = 1;
}
$row++;
}
fclose($handle);
}
I went assuming a certain type of design, as stripped out the CSV part, but the idea will remain the same :
<?php
/* Let's make an array of 100,000 rows (Be careful, you might run into memory issues with this, issues you won't have with a CSV read line by line)*/
$arr = [];
for ($i = 0; $i < 100000; $i++)
$arr[] = [rand(0, 1000000), 'Hey'];
/* Now let's have fun */
$ids = [];
foreach ($arr as $line => $couple) {
if ($ids[$couple[0]])
echo "Id " . $couple[0] . " on line " . $line . " already used<br />";
else
$ids[$couple[0]] = true;
}
?>
100, 000 rows aren't that much, this will be enough. (It ran in 3 seconds at my place.)
EDIT: As pointed out, in_array is less efficient than key lookup. I've updated my code consequently.
Are the IDs sorted with possible duplicates in between or are they randomly distributed?
If they are sorted and there are no holes in the list (1,2,3,4 is OK; 1,3,4,7 is NOT OK) then just store the last ID you read and compare it with the current ID. If current is equal or less than last then it's a duplicate.
If the IDs are in random order then you'll have to store them in an array. You have multiple options here. If you have plenty of memory just store the ID as a key in a plain PHP array and check it:
$ids = array();
// ... read and parse CSV
if (isset($ids[$newId])) {
// you have a duplicate
} else {
$ids[$newId] = true; // new value, not a duplicate
}
PHP arrays are hash tables and have a very fast key lookup. Storing IDs as values and searching with in_array() will hurt performance a lot as the array grows.
If you have to save memory and you know the number of lines you going to read from the CSV you could use SplFixedArray instead of a plain PHP array. The duplicate check would be the same as above.
I am trying to parse a csv file into an array. Unfortunately one of the columns contains commas and quotes (Example below). Any suggestions how I can avoid breaking up the column in to multiple columns?
I have tried changing the deliminator in the fgetcsv function but that didn't work so I tried using str_replace to escape all the commas but that broke the script.
Example of CSV format
title,->link,->description,->id
Achillea,->http://www.example.com,->another,short example "Of the product",->346346
Seeds,->http://www.example.com,->"please see description for more info, thanks",->34643
Ageratum,->http://www.example.com,->this is, a brief description, of the product.,->213421
// Open the CSV
if (($handle = fopen($fileUrl, "r")) !==FALSE) {
// Set the parent array key to 0
$key = 0;
// While there is data available loop through unlimited times (0) using separator (,)
while (($data = fgetcsv($handle, 0, ",")) !==FALSE) {
// Count the total keys in each row
$c = count($data);
//Populate the array
for ($x = 0; $x < $c; $x++) {
$arrCSV[$key][$x] = $data[$x];
}
$key++;
} // end while
// Close the CSV file
fclose($handle);
}
Maybe you should think about using PHP's file()-function which reads you CSV-file into an array.
Depending on your delimiter you could use explode() then to split the lines into cells.
here an example:
$csv_file("test_file.csv");
foreach($csv_file as $line){
$cell = explode(",->", $line); // ==> if ",->" is your csv-delimiter!
$title[] = $cell[0];
$link[] = $cell[1];
$description = $cell[2];
$id[] = $cell[3];
}
I've got a list in a text file with the top 1000 words used in the english language. Each line has a list of up to 50 words, like this:
the,stuff,is,thing,hi,bye,hello,a,stuffs
cool,free,awesome,the,pray,is,crime
etc.
I need to write code using that file as input, to make an output file with the a list of pairs of words which appear together in at least fifty different lists. For example, in the above example, THE & IS appear together twice, but every other pair appears only once.
I can't store all possible pairs of words, so no brute force.
I'm trying to learn the language and I'm stuck on this exercise of the book. Please help. Any logic, guidance or code for this would help me.
This is what I have so far. It doesn't do what's intended but I'm stuck:
Code:
//open the file
$handle = fopen("list.txt", 'r');
$count = 0;
$is = 0;
while(!feof($handle)) {
$line = fgets($handle);
$words = explode(',', $line);
echo $count . "<br /><br />";
print_r($words);
foreach ($words as $word) {
if ($word == "is") {
$is++;
}
}
echo "<br /><br />";
$count++;
}
echo "Is count: $is";
//close the file
fclose($handle);
$fp = fopen('output.txt', 'w');
fwrite($fp, "is count: " . $is);
fclose($fp);
This is what I came up with but I think it's too bloated:
plan:
check the first value of the $words array
store the value into $cur_word
store $cur_word as a key in an array ($compare) and
store the counter (line number) as the value of that key
it'll be 1 at this point
see if $cur_word is on each line and if it is then
put the value into $compare with the key as $cur_word
if array has at least 50 values then continue
else go to the next value of the $words array
if it has 50 values then
go to the next value and do the same thing
compare both lists to see how many values match
if it's at least 50 then append
the words to the output file
repeat this process with every word
There are probably 100's of solutions to this problem. Here is one
$contents = file_get_contents("list.txt");
//assuming all words are separated by a , and converting new lines to word separators as well
$all_words = explode(",", str_replace("\n", ",", $contents));
$unique_words = array();
foreach ($all_words as $word) {
$unique_words[$word] = $word;
}
this will give you all the unique words in the file in an array.
You can also use the same technique to count the words
$word_counts = array();
foreach ($all_words as $word) {
if (array_key_exists($word, $word_counts)) {
$word_counts[$word]++;
} else {
$word_counts[$word] = 1;
}
}
then you can loop through and save the results
$fp = fopen("output.txt", "w");
foreach ($word_counts as $word => $count) {
fwrite($fp, $word . " occured " . $count . " times" . PHP_EOL);
}
fclose($fp);