Prevent duplicates in MYSQL database from uploaded CSV

Prevent duplicates in MYSQL database from uploaded CSV - php

I am using the following script to upload records to my MYSQL database, the problem I can see is if a client record is uploaded and it already exists in the database and is duplicated.
I have seen lots of posts on here about people asking on how to remove duplicates from the csv file itself on upload, e.g if there are two instances of the name bob and the postcode lh456gl in the csv dont upload it, but what I want to know is if its possible to check the database for a record first before adding that record so not to insert a record that already is there.
So something like :
if exist namecolumn=$name_being_inserted and postcode=postcode_being_inserted then
do not add that record.
Is this even possible to do ?.
<?php
//database connect info here
//check for file upload
if(isset($_FILES['csv_file']) && is_uploaded_file($_FILES['csv_file']['tmp_name'])){
//upload directory
$upload_dir = "./csv";
//create file name
$file_path = $upload_dir . $_FILES['csv_file']['name'];
//move uploaded file to upload dir
if (!move_uploaded_file($_FILES['csv_file']['tmp_name'], $file_path)) {
//error moving upload file
echo "Error moving file upload";
}
//open the csv file for reading
$handle = fopen($file_path, 'r');
while (($data = fgetcsv($handle, 1000, ',')) !== FALSE) {
//Access field data in $data array ex.
$name = $data[0];
$postcode = $data[1];
//Use data to insert into db
$sql = sprintf("INSERT INTO test (name, postcode) VALUES ('%s','%s')",
mysql_real_escape_string($name),
mysql_real_escape_string($postcode)
);
mysql_query($sql) or (mysql_query("ROLLBACK") and die(mysql_error() . " - $sql"));
}
//delete csv file
unlink($file_path);
}
?>

There are two pure MySQL methods that I can think of that would deal with this issue. REPLACE INTO and INSERT IGNORE.
REPLACE INTO will overwrite the existing row whereas INSERT IGNORE will ignore errors triggered by duplicate keys being entered in the database.
This is described in the manual as:
If you use the IGNORE keyword, errors that occur while executing the
INSERT statement are treated as warnings instead. For example, without
IGNORE, a row that duplicates an existing UNIQUE index or PRIMARY KEY
value in the table causes a duplicate-key error and the statement is
aborted. With IGNORE, the row still is not inserted, but no error is
issued.
For INSERT IGNORE to work you will need to setup a UNIQUE key/index on one or more of the fields. Looking at your code sample though you do not have anything that could be considered unique in your insert query. What if there are two John Smiths in Wolverhampton? Ideally you would have something like an email address to define as unique.

Simply create a UNIQUE-key over name and postcode, then a row cannot be inserted when a row with both values for that fields already exists.

I would let the records to be inserted in the database and then, after inserting those records, just execute:
ALTER IGNORE TABLE dup_table ADD UNIQUE INDEX(a,b);
where a, and b are your columns where you don't want to have duplicates (key columns...you can have them more). You can wrap all that into transaction. So, just start transaction, insert all records (no matter if they are duplicates), execute command I wrote, commit transaction and then you can remove that (a, b) unique index to prepare it for the next import. Easy.

Related

Truncate a MySQL table but exclude first column

I'm a little confused to how I can do this.
I am basically wanting to give my first column a 'NOT NULL AUTO_INCREMENT' and give each row it's own 'id'. The issue I am having is that the script I am using truncates the whole SQL table with a CSV file that is cron'd daily to update data.
I am currently using this script:
<?php
$databasehost = "localhost";
$databasename = "";
$databasetable = "";
$databaseusername="";
$databasepassword = "";
$fieldseparator = ",";
$lineseparator = "\n";
$enclosedbyquote = '"';
$csvfile = "db-core/feed/csv/csv.csv";
if(!file_exists($csvfile)) {
die("File not found. Make sure you specified the correct path.");
}
try {
$pdo = new PDO("mysql:host=$databasehost;dbname=$databasename",
$databaseusername, $databasepassword,
array(
PDO::MYSQL_ATTR_LOCAL_INFILE => true,
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION
)
);
} catch (PDOException $e) {
die("database connection failed: ".$e->getMessage());
}
$pdo->exec("TRUNCATE TABLE `$databasetable`");
$affectedRows = $pdo->exec("
LOAD DATA LOCAL INFILE ".$pdo->quote($csvfile)." REPLACE INTO TABLE `$databasetable`
FIELDS OPTIONALLY ENCLOSED BY ".$pdo->quote($enclosedbyquote)."
TERMINATED BY ".$pdo->quote($fieldseparator)."
LINES TERMINATED BY ".$pdo->quote($lineseparator)."
IGNORE 1 LINES");
echo "Loaded a total of $affectedRows records from this csv file.\n";
?>
Is it possible to amend this script to ignore my first column and truncate all of the data in the table apart from the first column?
I could then give all of the rows in the first column their own ID's any idea how I could do this?
I am still very nooby so please go easy on me :)

From the database's point of view, your question makes no sense: to truncate a table means to completely remove all rows from that table, and the bulk insert creates a whole load of new rows in its place. There is no notion in SQL of "deleting a column", or of "inserting columns into existing rows".
In order to add or overwrite data in existing rows, you need to update those rows. If you are bulk inserting data, that means you need to somehow line up each new row with an existing row. What happens if the number of rows changes? And if you are only keeping the ID of the row, what is it you are actually trying to line up? It's also worth pointing out that rows in a table don't really have an order, so if your thought is to match the rows "in order", you still need something to order by...
I think you need to step back and consider what problem you're actually trying to solve (look up "the X/Y problem" for more on getting stuck thinking about a particular approach rather than the real problem).
Some possibilities:
You need to assign the new data IDs which reuse the same range of IDs as the old data, but with different content.
You need to identify which imported rows are new, which updates, and which existing rows to delete, based on some matching criteria.
You don't actually want to truncate the data at all, because it's referenced elsewhere so needs to be "soft deleted" (marked inactive) instead.

Building an application to transform CSV files

I have a rough and complete working CSV transformer. The way my current system works is it imports the CSV file into an SQL database table with static column names, and exports only specific (needed) columns. This system works great but is only specific to one type of CSV file (because the column names are pre-defined.) I'm wondering how I can make this universal. Instead of having it insert column1, column2, column3. I want to insert Spreadsheet Column1, Spreadsheet Column2, Spreadsheet Column3, etc. How would I go about pulling the column names from the CSV file, and creating a new table in the database with the column names being those from the first row of the CSV file.
The current system:
Client uploads CSV file.
A table is created with predefined column names (column 1, column 2, column 3)
Using LOAD DATA INFILE -> PHP scripts will insert the information from the CSV file into the recently created table.
The next query that is ran is simply something along the lines of taking only specific columns out of the table and exporting it to a final CSV file.
The system that would be ideal:
Client uploads CSV file.
PHP scripts read the CSV file and takes only the first row (column names), after taking these column names, it'll create a new table based on the column names.
PHP scripts now use LOAD DATA INFILE.
The rest is the same as current system.
Current code:
import.php
include("/inc/database.php");
include("/inc/functions.php");
include("/inc/data.php");
if($_SERVER['REQUEST_METHOD'] == 'POST'){
$string = random_string(7);
$new_file_name = 'report_'. $string .'.csv';
$themove = move_uploaded_file($_FILES['csv']['tmp_name'], 'C:/xampp/htdocs/uploads/'.$new_file_name);
mysql_query("CREATE TABLE report_". $string ."(". $colNames .")") or die(mysql_error());
$sql = "LOAD DATA INFILE '/xampp/htdocs/uploads/report_". $string .".csv'
INTO TABLE report_". $string ."
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(". $insertColNames .")";
$query = mysql_query($sql) or die(mysql_error());
header('Location: download.php?dlname='.$string.'');
}
data.php (shortened most of this. In reality there are about 200 columns going in, twenty-thirty coming out)
<?php
$colNames = "Web_Site_Member_ID text,
Master_Member_ID text,
API_GUID text,
Constituent_ID text";
$insertColNames = "Web_Site_Member_ID,
Master_Member_ID,
API_GUID,
Constituent_ID";
$exportNames = "Web_Site_Member_ID, Date_Membership_Expires, Membership, Member_Type_Code";
?>
functions.php just includes the block of code for generating a random string/file name.

For CSV file reading please look at the fgetcsv() function. You should easily be able to extract a row of data and access each individual field in the resulting array for your column header definitions.

Importing CSV with odd rows into MySQL

I'm faced with a problematic CSV file that I have to import to MySQL.
Either through the use of PHP and then insert commands, or straight through MySQL's load data infile.
I have attached a partial screenshot of how the data within the file looks:
The values I need to insert are below "ACC1000" so I have to start at line 5 and make my way through the file of about 5500 lines.
It's not possible to skip to each next line because for some Accounts there are multiple payments as shown below.
I have been trying to get to the next row by scanning the rows for the occurrence of "ACC"
if (strpos($data[$c], 'ACC') !== FALSE){
echo "Yep ";
} else {
echo "Nope ";
}
I know it's crude, but I really don't know where to start.

If you have a (foreign key) constraint defined in your target table such that records with a blank value in the type column will be rejected, you could use MySQL's LOAD DATA INFILE to read the first column into a user variable (which is carried forward into subsequent records) and apply its IGNORE keyword to skip those "records" that fail the FK constraint:
LOAD DATA INFILE '/path/to/file.csv'
IGNORE
INTO TABLE my_table
CHARACTER SET utf8
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 4 LINES
(#a, type, date, terms, due_date, class, aging, balance)
SET account_no = #account_no := IF(#a='', #account_no, #a)

There are several approaches you could take.
1) You could go with #Jorge Campos suggestion and read the file line by line, using PHP code to skip the lines you don't need and insert the ones you want into MySQL. A potential disadvantage to this approach if you have a very large file is that you will either have to run a bunch of little queries or build up a larger one and it could take some time to run.
2) You could process the file and remove any rows/columns that you don't need, leaving the file in a format that can be inserted directly into mysql via command line or whatever.
Based on which approach you decide to take, either myself or the community can provide code samples if you need them.

This snippet should get you going in the right direction:
$file = '/path/to/something.csv';
if( ! fopen($file, 'r') ) { die('bad file'); }
if( ! $headers = fgetcsv($fh) ) { die('bad data'); }
while($line = fgetcsv($fh)) {
echo var_export($line, true) . "\n";
if( preg_match('/^ACC/', $line[0] ) { echo "record begin\n"; }
}
fclose($fh);
http://php.net/manual/en/function.fgetcsv.php

csv data import into mysql database using php

Hi I need to import a csv file of 15000 lines.
I m using the fgetcsv function and parsing each and every line..
But I get a timeout error everytime.
The process is too slow and data is oly partially imported.
Is there any way out to make the data import faster and more efficient?
if(isset($_POST['submit']))
{
$fname = $_FILES['sel_file']['name'];
$var = 'Invalid File';
$chk_ext = explode(".",$fname);
if(strtolower($chk_ext[1]) == "csv")
{
$filename = $_FILES['sel_file']['tmp_name'];
$handle = fopen($filename, "r");
$res = mysql_query("SELECT * FROM vpireport");
$rows = mysql_num_rows($res);
if($rows>=0)
{
mysql_query("DELETE FROM vpireport") or die(mysql_error());
for($i =1;($data = fgetcsv($handle, 10000, ",")) !== FALSE; $i++)
{
if($i==1)
continue;
$sql = "INSERT into vpireport
(item_code,
company_id,
purchase,
purchase_value)
values
(".$data[0].",
".$data[1].",
".$data[2].",
".$data[3].")";
//echo "$sql";
mysql_query($sql) or die(mysql_error());
}
}
fclose($handle);
?>
<script language="javascript">
alert("Successfully Imported!");
</script>
<?
}
The problem is everytime it gets stuck in between the import process and displays the following errors:
Error 1 :
Fatal Error: Maximum time limit of 30 seconds exceeded at line 175.
Error 2 :
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'S',0,0)' at line 1
This error I m not able to detect...
The file is imported oly partial everytime.. oly around 200 300 lines out of a 10000 lines..

You can build a batch update string for every 500 lines of csv and then execute it at once if you are doing the mysql inserts on each line. It'll be faster.
Another solution is to read the file with an offset:
Read the first 500 lines,
Insert them to the database
Redirect to csvimporter.php?offset=500
Return the 1. step and read the 500 lines starting with offset 500 this time.
Another solution would be setting the timeout limit to 0 with:
set_time_limit(0);

Set this at the top of the page:
set_time_limit ( 0 )
It will make the page run endlessly. However, that is not recommended but if you have no other option then cant help!
You can consult the documentation here.
To make it faster, you need to check your the various SQL you are sending and see if you have proper indexes created.
If you are calling user defined functions and these functions are referring to global variables, then you can minimize the time take even more by passing those variables to the function and change the code so that the function refers to those passed variables. Referring to global variables is slower than local variables.

You can make use of LOAD DATA INFILE which is a mysql utility, this is much faster than fgetcsv
more information is available on
http://dev.mysql.com/doc/refman/5.1/en/load-data.html

simply use this # the beginning of your php import page
ini_set('max_execution_time',0);

PROBLEM:
There is a huge performance impact on the way you INSERT data into your table. For every one of your records you send an INSERT request to the server, 15000 INSERT requests that's huge!
SOLUTION::
Well you should group your data like the way mysqldump does. In your case you just need three insert statement not 15000 as below:
before the loop write:
$q = "INSERT into vpireport(item_code,company_id,purchase,purchase_value)values";
And inside the loop concatenate the records to the query as below:
$q .= "($data[0],$data[1],$data[2],$data[3]),";
Inside the loop check that the counter is equal to 5000 OR 10000 OR 15000 then insert data to the vpireprot table and then set the $q to INSERT INTO... again.
run the query and enjoy!!!

If this is a one-time exercise, PHPMyAdmin supports Import via CSV.
import-a-csv-file-to-mysql-via-phpmyadmin
He also notes the user of leveraging MySQL's LOAD DATA LOCAL INFILE. This is a very fast way to import data into a database table. load-data Mysql Docs link
EDIT:
Here is some pseudo-code:
// perform the file upload
$absolute_file_location = upload_file();
// connect to your MySQL database as you would normally
your_mysql_connection();
// execute the query
$query = "LOAD DATA LOCAL INFILE '" . $absolute_file_location .
"' INTO TABLE `table_name`
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(column1, column2, column3, etc)";
$result = mysql_query($query);
Obviously, you need to ensure good SQL practices to prevent injection, etc.

optimizing Code for inserting 27000*2 keys from plain text file to DB

I need to insert data from a plain text file, explode each line to 2 parts and then insert to the database. I'm doing in this way, But can this programme be optimized for speed ?
the file has around 27000 lines of entry
DB structure [unique key (ext,info)]
ext [varchar]
info [varchar]
code:
$string = file_get_contents('list.txt');
$file_list=explode("\n",$string);
$entry=0;
$db = new mysqli('localhost', 'root', '', 'file_type');
$sql = $db->prepare('INSERT INTO info (ext,info) VALUES(?, ?)');
$j=count($file_list);
for($i=0;$i<$j;$i++)
{
$data=explode(' ',$file_list[$i],2);
$sql->bind_param('ss', $data[0], $data[1]);
$sql->execute();
$entry++;
}
$sql->close();
echo $entry.' entry inserted !<hr>';

If you are sure that file contains unique pairs of ext/info, you can try to disable keys for import:
ALTER TABLE `info` DISABLE KEYS;
And after import:
ALTER TABLE `info` ENABLE KEYS;
This way unique index will be rebuild once for all records, not every time something is inserted.
To increase speed even more you should change format of this file to be CSV compatible and use mysql LOAD DATA to avoid parsing every line in php.

When there are multiple items to be inserted you usually put all data in a CSV file, create a temporary table with columns matching CSV, and then do a LOAD DATA [LOCAL] INFILE, and then move that data into destination table. But as I can see you don't need much additional processing, so you can even treat your input file as a CSV without any additional trouble.
$db->exec('CREATE TEMPORARY TABLE _tmp_info (ext VARCHAR(255), info VARCHAR(255))');
$db->exec("LOAD DATA LOCAL INFILE '{$filename}' INTO TABLE _tmp_info
FIELDS TERMINATED BY ' '
LINES TERMINATED BY '\n'"); // $filename = 'list.txt' in your case
$db->exec('INSERT INTO info (ext, info) SELECT t.ext, t.info FROM _tmp_info t');
You can run a COUNT(*) on temp table after that to show how many records were there.

If you have a large file that you want to read in I would not use file_get_contents. By using it you force the interpreter to store the entire contents in memory all at once, which is a bit wasteful.
The following is a snippet taken from here:
$file_handle = fopen("myfile", "r");
while (!feof($file_handle)) {
$line = fgets($file_handle);
echo $line;
}
fclose($file_handle);
This is different in that all you are keeping in memory from the file at a single instance in time is a single line (not the entire contents of the file), which in your case will probably lower the run-time memory footprint of your script. In your case, you can use the same loop to perform your INSERT operation.

If you can use something like Talend. It's an ETL program, simple and free (it has a paid version).

Here is the magic solution [3 seconds vs 240 seconds]
ALTER TABLE info DISABLE KEYS;
$db->autocommit(FALSE);
//insert
$db->commit();
ALTER TABLE info ENABLE KEYS;

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Prevent duplicates in MYSQL database from uploaded CSV - php

Simply create a UNIQUE-key over name and postcode, then a row cannot be inserted when a row with both values for that fields already exists.

Related

Truncate a MySQL table but exclude first column

Building an application to transform CSV files

Importing CSV with odd rows into MySQL

csv data import into mysql database using php

optimizing Code for inserting 27000*2 keys from plain text file to DB

Categories

Resources