update table while preserving the value of a column - php

I have a scheduled job that will parse the following CSV file, and if the server is already in the table execute an update, insert if not. (The server name is the key).
The problem is that the CSV may contain lines with the same server but with a different application. In those cases, I must do an update, but keep the previous value of the app.
How can I keep the previous value before updating?
server_name;ip_address;domain;application
s1;10.10.10.4;dom1;app1
s1;10.10.10.4;dom1;app2
s2;10.15.69.8;dom5;app10
s3;10.15.69.39;dom7;app5
My code is like ($tab contains what I have on the table)
while(($line = fgetcsv($lines))!== false){
if (in_array($line[0],$tab)){ //update query}
else { //insert query }
}

You say this is a scheduled job. Because this is a repeating activity it would be worth your while to build some infrastructure: specifically an external table to provide a SQL interface for the CSV file.
External tables are highly neat: they are defined using CREATE TABLE but instead of a storage clause we specify a source file (pretty similar to SQL*Loader controlfile syntax). Then we can query the data in the CSV file without loading it into a staging table. Find out more.
As for the problem of inserting and updating you should check out the MERGE statement. This handles INSERT and UPDATE in a single statement. Note that you can drive the MERGE using an external table:
merge into your_table
using ( select * from external_table ) ext
on ( your_table.server_name = ext.server_name )
when not matched
then insert values ( ext.server_name, ....)
when matched
then update set your_table.ip_address = ext.ip_address ....
We can apply conditional filters to the INSERT and UPDATE clauses. Find out more.

Import you csv rows value in a temporary parking table and then use an insert select for the rows not in your final destination table
insert into your_final_dest (server_name,ip_address,domain,application)
select server_name,ip_address,domain,application
from your_csv_parking
where server_name not in ( select server_name from your_final_dest )
adn use update for others
UPDATE your_final_dest
SET application = (SELECT application
FROM your_csv_parking
WHERE your_final_dest.server = your_csv_parking.server)
WHERE EXISTS (SELECT your_csv_parking.application
FROM your_csv_parking
WHERE your_final_dest.server = your_csv_parking.server);

Related

Query is taking 8 hours for checking and inserting against 3 millions of data

I get new 10000s of xml files data everyday.
and I always have run a query to see if there is any new data into those XML files and if that doesn't exists into our database then insert that data into our table.
Here is the code
if(!Dictionary::where('word_code' , '=' , $dic->word_code)->exists()) {
// then insert into the database.
}
where $dic->word_code is coming from thousands of XML files. every time it opens up the new XML file one by one then check this record exists then open a new XML file and check if it doesn't exists then insert the record then move to another file and do the same procedure with 10000 XML of files.
each XML file is about 40 to 80mb which has lots of data.
I already have 2981293 rows so far and checking against 2981293 rows with my XML files then inserting the row seems to be really time-consuming and resource greedy task.
word_code is already index.
The current method takes about 8 hours to finish up the procedure.
By the way I must mention this that after running this huge procedure of 8 hours it downloads about 1500 to 2000 rows of data per day.
Comparing the file to the database line by line is the core issue. Both the filesystem and databases support comparing millions of rows very quickly.
You have two options.
Option 1:
Keep a file backup of the previous run to run filesystem compare to find differences in the file.
Option 2:
Load the XML file into a MySQL table using LOAD DATA INFILE. Then run a query on all rows to find both new and changed rows. Be sure to index the table with a well defined unique key to keep this query efficient.
I would split this job into two tasks:
Use your PHP script to load the XML data unconditionally in a temporary table that has no constraints, no primary key, no indexes. Make sure to truncate that table before loading the data.
Perform one single INSERT statement, to merge records from that temporary table into your main table, possibly with an ON DUPLICATE KEY UPDATE or IGNORE option, or otherwise a negative join clause. See INSERT IGNORE vs INSERT … ON DUPLICATE KEY UPDATE
For the second point, you could for instance do this:
INSERT IGNORE
INTO main
SELECT *
FROM temp;
If the field to compare is not a primary key in the main table, or is not uniquely indexed, then you might need a statement like this:
INSERT INTO main
SELECT temp.*
FROM temp
LEFT JOIN main m2
ON m2.word_code = temp.word_code
WHERE m2.word_code is NULL;
But this will be a bit slower than a primary-key based solution.

better way to mass import unique contacts into sql (php, mysql)

I need to import a very large contact list (name & email in csv format, PHP -> MySQL). I want to skip existing email. My current method is very slow in a production DB, with a lot of data.
Assuming 100 contacts (may be 10,000 contacts)
Original Steps
got the input data
check each contact in the table for existing email
100 select
mass insert in to the table
insert into value (), (), ()
1 insert
This is slow.
I want to improve the process and time.
I have thought of 2 ways.
Method 1
create a max_addressbook_temp (same structure as max_addressbook) for temporary space
clear/delete all records for the user in max_addressbook_temp
insert all records in max_addressbook_temp
create a list of duplicated record (for front end)
insert unique records from max_addressbook_temp into max_addressbook
advantage
can get a list of duplicated records to display in front end
very fast - want to import 100 record, always need only 2 sql calls: 1 insert into values and 1 insert into select
disadvantage
need a seperate table
Method 2
create unqiue index (book_user_name_id, book_email)
for each record, use insert ignore into ... (this will ignore duplicated book_user_name_id, book_email)
advantage
less code
disadvantage
can't display the contacts that are not imported
slower, want to import 100 records, need to call 100 insert
Any feedback? How are the most common & efficient way to importing a lot of addresses into DB?
=====
Here is more detail for method 1. Do you think it is a good idea?
There are 4 steps.
clear the temp data for the user
insert the import data, not checking for duplicated
selet the duplicated data for display or count
insert data that are not duplicated
// clear the temp data for the user
delete max_addressbook_temp where book_user_id =
// insert the import data, not checking for duplicated
insert into max_addressbook_temp values (), (), ()....
// selet the duplicated data for display or count
select * from max_addressbook_temp t1, max_addressbook t2
where t1.book_user_id = t2.book_user_id
and t1.book_email = t2.book_email
// insert data that are not duplicated
insert into max_addressbook t1
select * from max_addressbook_temp t2
where t1.book_user_id = t2.book_user_id
and t1.book_email <> t2.book_email
Q: Wny not use mySQL BULK INSERT?
EXAMPLE:
LOAD DATA INFILE 'C:\MyTextFile'
INTO TABLE myDatabase.MyTable
FIELDS TERMINATED BY ','
ADDENDUM:
It sounds like you're actually asking two, separate, questions:
Q1: How do I read a .csv file into a mySQL database?
A: I'd urge you to consider LOAD DATA INFILE
Q2: How do I "diff" the data in the .csv vs. the data already in mySQL (either intersection of rows in both; or the rows in one, but not the other)?
A: There is no "efficient" method. Any way you do it, you're probably going to be doing a full-table scan.
I would suggest the following:
Load your .csv data into a temp table
Do an INTERSECT of the two tables:
SELECT tableA.id
FROM tableA
WHERE tableA.id IN (SELECT id FROM tableB);
Save the results of your "intersect" query
Load the .csv data into your actual able

CSV file upload to handle status update & inserting new records

While working on a project, hosted locally, I'm stuck at managing CSV uploads. One of tasks require me to upload data on daily basis that has either new entries or updated status for existing entries. There is also an probability that some of the entries (that exists in database) has no updated status.
Problem statement;
I've created a CSV upload feature that uploads the CSV file to a particular location and imports the information in assigned TABLE.
I want to know on what is the best way to verify the database records when I do the CSV upload.
It should ideally work as following;
if entry doesn't exists (INSERT new entry basis data from CSV file)
if the entry exists and has status SAME as the new uploaded CSV file (IGNORE & do nothing)
if the entry exists and has DIFFERENT status than the one in new uploaded CSV file (UPDATE status to what is mentioned in CSV file)
Database / CSV file structure
tracking_id (auto increment)
odanumber (uploaded through CSV & can have duplicate entries)
airwaybill (uploaded through CSV & UNIQUE)
courierful (uploaded through CSV & can have duplicate entries)
delstatus (uploaded through CSV & is what gets updated mostly)
deliverydate (uploaded through CSV & gets updated with each delivery)
From the above, delstatus gets updated almost each time (for existing entries) the new CSV is uploaded and hence needs to be checked.
I assume that we can pick 'airwaybill' to check if it exists, and
if it does, check if the delstatus is same as of CSV file or
update. If 'airwaybill' doesn't exist then a new records must be added
to the database. As that would save me from entering all records in
database unnecessarily. Or can be done may be in a better way (that
I'm yet to explore).
What's happening right now;
I'm able to upload the complete set of CSV file, creating new entries in database through following code.
<?php
if(isset($_POST['csv']))
{
$sqlname= 'localhost';
$username= 'root';
$table= 'tracking';
$password= '';
$db='aatrack';
$file=$_POST['csv'];
$cons= mysqli_connect("$sqlname", "$username","$password","$db") or die(mysql_error());
$result1=mysqli_query($cons,"select count(*) count from $table");
$r1=mysqli_fetch_array($result1);
$count1=(int)$r1['count'];
mysqli_query($cons, '
LOAD DATA LOCAL INFILE "'.$file.'"
INTO TABLE '.$table.'
FIELDS TERMINATED by \',\'
LINES TERMINATED BY \'\n\'
IGNORE 1 LINES
')or die(mysql_error());
$result2=mysqli_query($cons,"select count(*) count from $table");
$r2=mysqli_fetch_array($result2);
$count2=(int)$r2['count'];
$count=$count2-$count1;
if($count>0)
{
header("location:success.php?id=$count");
}
}
?>
Can you please help in guiding the best way possible to achieve the same. I understand that it can be done by first uploading the information to a temp_table and comparing the same before entries are updated in the LIVE table.
Please suggest an optimum way to achieve the results.
Thank you for reading this far.
Best regards,
Amit Agnihotri
How LOAD DATA INFILE works
Based on an UNIQUE index, LOAD DATA INFILE inserts a new record or updates an existing one (only if the REPLACE option is active).
(1) Regarding insert:
If the csv input value for the UNIQUE index column is NOT found in the db table, then a new record is added, with the (defined) input values from csv file.
(2) Regarding update:
If the csv input value for the UNIQUE index column is found in the db table, then the LOAD DATA INIFILE query performs the following operations (in this order!):
It inserts the new csv values as a new record with a new PRIMARY KEY id;
It deletes the old record from the db.
NB: In the rest of my answer I will speak only about the update part (2).
BEFORE INSERT-TRIGGER as solution for conditional updates
Since LOAD DATA INFILE runs an insert operation before a delete one, you can make use of the fact that the old db record still exists when the new record with the csv values is inserted. So, you can customize your new input values based on the values contained in the old record. The really cool part of this is: you can even maintain the old value of the PRIMARY KEY field.
The key is to define a BEFORE INSERT-TRIGGER in which all the needed customizations, validations and assignments reside:
Fetch the old record's values by running a SELECT sql statement;
Store the fetched values into prior defined user variables;
Use the user variables to compare the old values with the csv input values;
Based on this comparisons: assign the old value of the PRIMARY KEY field as the new one and change the new csv values to the old ones or to others, if needed, too.
Then perform the LOAD DATA INFILE query from PHP.
The codes
Create table syntax:
CREATE TABLE `tracking` (
`tracking_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`odanumber` int(11) DEFAULT NULL,
`airwaybill` int(11) DEFAULT NULL,
`courierful` varchar(100) DEFAULT NULL,
`delstatus` tinyint(1) DEFAULT NULL,
`deliverydate` varchar(19) DEFAULT NULL,
PRIMARY KEY (`tracking_id`),
UNIQUE KEY `uni_airwaybill` (`airwaybill`)
) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf8;
BEFORE INSERT-TRIGGER:
USE `tests`;
DELIMITER $$
DROP TRIGGER IF EXISTS tests.tracking_BEFORE_INSERT$$
USE `tests`$$
CREATE DEFINER = CURRENT_USER TRIGGER `tests`.`tracking_BEFORE_INSERT` BEFORE INSERT ON `tracking` FOR EACH ROW
BEGIN
/* Define vars to store old record values. */
SET #old_tracking_id = NULL;
SET #old_odanumber = NULL;
SET #old_courierful = NULL;
SET #old_delstatus = NULL;
SET #old_deliverydate = NULL;
/*
Fetch the existing record if exists and pass
its values into the correspnding vars.
*/
SELECT
tracking_id,
odanumber,
courierful,
delstatus,
deliverydate
INTO
#old_tracking_id,
#old_odanumber,
#old_courierful,
#old_delstatus,
#old_deliverydate
FROM tracking
WHERE airwaybill = NEW.airwaybill
LIMIT 1;
/* If an old record was found... */
IF #old_tracking_id IS NOT NULL THEN
/* ...set the new record's tracking_id to it. */
SET NEW.tracking_id = #old_tracking_id;
/* ...and if delstatus are the same... */
IF NEW.delstatus = #old_delstatus THEN
/* ...maintain the old record values. */
SET NEW.odanumber = #old_odanumber;
SET NEW.courierful = #old_courierful;
SET NEW.deliverydate = #old_deliverydate;
END IF;
END IF;
END$$
DELIMITER ;
CSV file (tracking.csv)
odanumber,airwaybill,"courierful",delstatus,"deliverydate"
19,1,abc,0,2017-04-31
25,2,def,1,2017-05-31
103,3,ghi,1,2017-06-31
324,4,jkl,1,2017-07-31
564,5,mno,0,2017-08-31
LOAD DATA INFILE function (called from PHP)
LOAD DATA INFILE "<PATH-TO>/tracking.csv"
REPLACE
INTO TABLE tests.tracking
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(odanumber, airwaybill, courierful, delstatus, deliverydate);
Notes:
*) In regards of LOAD DATA INFILE, it can be that you run into the error:
ERROR 1290 (HY000): The MySQL server is running with the
--secure-file-priv option so it cannot execute this statement
It means: The LOAD DATA INFILE has no permission to read the csv file. So you must set secure-file-priv in the configuration file of your database (my.cnf, or my.ini) yourself. Like this:
[mysqld]
secure-file-priv = "<PATH-TO-FOLDER-CONTAINING-THE-CSV-FILES>/"
*) You can NOT define a stored procedure from which to run the LOAD DATA INFILE.
In the end, there are also other solutions involving temporary tables, which, no doubt, can work perfectly. One of them is presented in this great article. So, the trigger solution is just another approach.
Good luck!
There are two scenarios here:
the table's columns exactly match the csv columns. in that case REPLACE is the answer - it's a keyword to the LOAD DATA INFILE see doc entry
the table's columns don't match the csv columns: REPLACE would cause conflicting records to be removed and reinserted, effectively removing the additional data. In which case LOAD DATA INFILE is not effective by itself, you need another approach with either filtering your file before, doing updates via php or some other method.
In any case, if you want to add more "logic" to the import process, maybe LOAD DATA INFIlE isn't really the right approach, but using temp tables may very well be to benefit from all the goodness databases provide.

check if row is in table in php

I have to write a script that updates some mysql tables. For this purpose I am provided with a .dbf file that contains the up-to-date data. So what I am doing is:
Convert .dbf file to .sql file (using this script by xtranophilist )
Extract mysql statements from .sql file and execute them (Creating mysql table "temp" and filling it with data)
Get data from freshly created table ("temp") where column tablenr = '1' and check for each row if row exists in other table ("data_side1")
1.and 2. is working so far, now I am wondering how to do 3:
How do you check if some row exists in some table in MySQL via PHP? And what is the best way to do so?
You can use this:
SELECT EXISTS(SELECT 1 FROM data_side1 WHERE ...)
For more details please check http://dev.mysql.com/doc/refman/5.7/en/exists-and-not-exists-subqueries.html

Big Data : Handling SQL Insert/Update or Merge best line by line or by CSV?

So basically I have a bunch of 1 Gig data files (compressed) with just text files containing JSON data with timestamps and other stuff.
I will be using PHP code to insert this data into MYSQL database.
I will not be able to store these text files in memory! Therefor I have to process each data-file line by line. To do this I am using stream_get_line().
Some of the data contained will be updates, some will be inserts.
Question
Would it be faster to use Insert / Select / Update statements, or create a CSV file and import it that way?
Create a file thats a bulk operation and then execute it from sql?
I need to basically insert data with a primary key that doesnt exist, and update fields on data if the primary key does exist. But I will be doing this in LARGE Quantities.
Performance is always and issue.
Update
The table has 22,000 Columns, and only say 10-20 of them do not contain 0.
I would load all of the data to a temporary table and let mysql do the heavy lifting.
create the temporary table by doing create table temp_table as select * from live_table where 1=0;
Read the file and create a data product that is compatible for loading with load data infile.
Load the data into the temporary table and add an index for your primary key
Next Isolate you updates by doing a inner query between the live table and the temporary table. walk through and do your updates.
remove all of your updates from the temporary (again using an inner join between it and the live table).
process all of the inserts with a simple insert into live_table as select * from temp_table.
drop the temporary table, go home and have a frosty beverage.
This may be over simplified for your use case but with a little tweaking it should work a treat.

Categories