I'm a little confused to how I can do this.
I am basically wanting to give my first column a 'NOT NULL AUTO_INCREMENT' and give each row it's own 'id'. The issue I am having is that the script I am using truncates the whole SQL table with a CSV file that is cron'd daily to update data.
I am currently using this script:
<?php
$databasehost = "localhost";
$databasename = "";
$databasetable = "";
$databaseusername="";
$databasepassword = "";
$fieldseparator = ",";
$lineseparator = "\n";
$enclosedbyquote = '"';
$csvfile = "db-core/feed/csv/csv.csv";
if(!file_exists($csvfile)) {
die("File not found. Make sure you specified the correct path.");
}
try {
$pdo = new PDO("mysql:host=$databasehost;dbname=$databasename",
$databaseusername, $databasepassword,
array(
PDO::MYSQL_ATTR_LOCAL_INFILE => true,
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION
)
);
} catch (PDOException $e) {
die("database connection failed: ".$e->getMessage());
}
$pdo->exec("TRUNCATE TABLE `$databasetable`");
$affectedRows = $pdo->exec("
LOAD DATA LOCAL INFILE ".$pdo->quote($csvfile)." REPLACE INTO TABLE `$databasetable`
FIELDS OPTIONALLY ENCLOSED BY ".$pdo->quote($enclosedbyquote)."
TERMINATED BY ".$pdo->quote($fieldseparator)."
LINES TERMINATED BY ".$pdo->quote($lineseparator)."
IGNORE 1 LINES");
echo "Loaded a total of $affectedRows records from this csv file.\n";
?>
Is it possible to amend this script to ignore my first column and truncate all of the data in the table apart from the first column?
I could then give all of the rows in the first column their own ID's any idea how I could do this?
I am still very nooby so please go easy on me :)
From the database's point of view, your question makes no sense: to truncate a table means to completely remove all rows from that table, and the bulk insert creates a whole load of new rows in its place. There is no notion in SQL of "deleting a column", or of "inserting columns into existing rows".
In order to add or overwrite data in existing rows, you need to update those rows. If you are bulk inserting data, that means you need to somehow line up each new row with an existing row. What happens if the number of rows changes? And if you are only keeping the ID of the row, what is it you are actually trying to line up? It's also worth pointing out that rows in a table don't really have an order, so if your thought is to match the rows "in order", you still need something to order by...
I think you need to step back and consider what problem you're actually trying to solve (look up "the X/Y problem" for more on getting stuck thinking about a particular approach rather than the real problem).
Some possibilities:
You need to assign the new data IDs which reuse the same range of IDs as the old data, but with different content.
You need to identify which imported rows are new, which updates, and which existing rows to delete, based on some matching criteria.
You don't actually want to truncate the data at all, because it's referenced elsewhere so needs to be "soft deleted" (marked inactive) instead.
Related
I am updating this question, please do not mind the comments below as, instead of deleting this question, I reworked it to give it a sense.
A form on a php page let me create a csv file, to name this file I need to run a SELECT on the database, it the name does not exists, my query must create it; if the name exist, it must update it.
The problem is, there is a chance where 2 or more users can push the submit button at the same time.
This will cause the query to reurn the same value to all of them, therefore creating or updating the file in a non-controlled way.
I need to create a system, that will LOCK the table for INSERT/UPDATE and, if in the meantime another connection appear, the column on the database that will name the file must be incremented of +1.
$date = date("Ymd");
$csv = fopen("/tmp/$user_$date_$id_$reference.csv", 'w+');
Where "reference" is a progressive number in the format of "Axxxx". x's are numbers.
The SELECT would be:
$sql = pg_query($conn, " SELECT user, identification, reference, FROM orders WHERE identification = '$_POST[id_order]' ORDER BY date DESC LIMIT 1");
while ($row = pg_fetch_row($sql)) {
$user = $row[0];
$id = $row[1];
$reference = $row[2];
}
I need to create a function, like the one below, where users can both INSERT and UPDATE, and in the case of concurrent connection, the ones that are not the first will have "reference" incremented of 1.
CREATE OR REPLACE FUNCTION upsert_identification( in_identification TEXT, in_user TEXT ) RETURNS void as $$
BEGIN
UPDATE table SET identification=in_identification, user=in_user, reference=in_reference WHERE identification = in_identification;
IF FOUND THEN
RETURN;
END IF;
BEGIN
INSERT INTO table ( identification, user, reference ) VALUES (in_identification, in_user, in_reference );
EXCEPTION WHEN OTHERS THEN
-- Should the increment be here?
END;
RETURN;
END;
$$ language plpgsql;
I hope what I'm asking is clear, re-read and I do understand it. Please comment below for any question you might have.
I really hope someone can help me!
I was looking for some clues in the postgres manual, I found this link about locking but I am not so sure this is what I need: LINK
I have two externally hosted third-party .txt files that are updated on an irregular basis by someone other than myself. I have written a script that pulls this information in, manipulates it, and creates a merged array of data suitable for use in a database. I'm not looking for exact code but rather a description of a good process that will work efficiently in inserting a new row from this array if it doesn't already exist, updating a row in the table if any values have changed, or deleting a row in the table if it no longer exists in the array of data.
The data is rather simple and has the following structure:
map (string) | route (string) | time (decimal) | player (string) | country (string)
where a map and route combination must be unique.
Is there any way to do all needed actions without having to loop through all of the external data and all of the data from the table in my database? If not, what would be the most efficient method?
Below is what I have written. It takes care of all but the delete part:
require_once('includes/db.php');
require_once('includes/helpers.php');
$data = array_merge(
custom_parse_func('http://example1.com/ex.txt'),
custom_parse_func('http://example2.com/ex.txt')
);
try {
$dsn = "mysql:host=$dbhost;dbname=mydb";
$dbh = new PDO($dsn, $dbuser, $dbpass);
$dbh->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
foreach ($data as $value) {
$s = $dbh->prepare('INSERT INTO table SET map=:map, route=:route, time=:time, player=:player, country=:country ON DUPLICATE KEY UPDATE map=:map2, route=:route2, time=:time2, player=:player2, country=:country2');
$s->execute(array(
':map' => $value['map'],
':route' => $value['route'],
':time' => $value['time'],
':player' => $value['player'],
':country' => $value['country'],
':map2' => $value['map'],
':route2' => $value['route'],
':time2' => $value['time'],
':player2' => $value['player'],
':country2' => $value['country']
));
}
} catch(PDOException $e) {
echo $e;
}
You mention that you're using MySQL, which has a handy INSERT ... ON DUPLICATE KEY UPDATE ... statement (documentation here). You will have to iterate over your collection of data (but not the existing table). I would handle it a little differently than #Tim B does...
create a temporary table to hold the new data.
loop through your new data and insert it into the new table
run an INSERT ... ON DUPLICATE KEY UPDATE ... statement inserting from the temporary table into the existing table - that takes care of both inserting new records and updated changed records.
run a DELETE FROM [existing table] t1 LEFT JOIN [temporary table] t2 ON [whatever key(s) you have] WHERE t2.id IS NULL - this will delete everything from the existing table that does not appear in the temporary table.
The nice thing about temporary tables is that they are automatically dropped when the connection closes (as well has having some other nice features like being invisible to other connections).
The other nice thing about this method is that you can do some (or all) of your data manipulation in the database after you insert it into a table in step 1. It is often faster and simpler to do this kind of thing through SQL instead of looping through and changing values in your array.
The simplest way would be to truncate the table and then insert all the values. This will handle all of your requirements.
Assuming that is not viable though then you need to remember which rows have been modified, that can be done using a flag, a version number, or a timestamp. For example:
Update the table, set the "updated" flag to 0 on every row
Loop through doing an upsert for every item (http://dev.mysql.com/doc/refman/5.6/en/insert-on-duplicate.html). Set the flag to 1 in each upsert.
Delete every entry from the database with the flag set to 0.
I'm faced with a problematic CSV file that I have to import to MySQL.
Either through the use of PHP and then insert commands, or straight through MySQL's load data infile.
I have attached a partial screenshot of how the data within the file looks:
The values I need to insert are below "ACC1000" so I have to start at line 5 and make my way through the file of about 5500 lines.
It's not possible to skip to each next line because for some Accounts there are multiple payments as shown below.
I have been trying to get to the next row by scanning the rows for the occurrence of "ACC"
if (strpos($data[$c], 'ACC') !== FALSE){
echo "Yep ";
} else {
echo "Nope ";
}
I know it's crude, but I really don't know where to start.
If you have a (foreign key) constraint defined in your target table such that records with a blank value in the type column will be rejected, you could use MySQL's LOAD DATA INFILE to read the first column into a user variable (which is carried forward into subsequent records) and apply its IGNORE keyword to skip those "records" that fail the FK constraint:
LOAD DATA INFILE '/path/to/file.csv'
IGNORE
INTO TABLE my_table
CHARACTER SET utf8
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 4 LINES
(#a, type, date, terms, due_date, class, aging, balance)
SET account_no = #account_no := IF(#a='', #account_no, #a)
There are several approaches you could take.
1) You could go with #Jorge Campos suggestion and read the file line by line, using PHP code to skip the lines you don't need and insert the ones you want into MySQL. A potential disadvantage to this approach if you have a very large file is that you will either have to run a bunch of little queries or build up a larger one and it could take some time to run.
2) You could process the file and remove any rows/columns that you don't need, leaving the file in a format that can be inserted directly into mysql via command line or whatever.
Based on which approach you decide to take, either myself or the community can provide code samples if you need them.
This snippet should get you going in the right direction:
$file = '/path/to/something.csv';
if( ! fopen($file, 'r') ) { die('bad file'); }
if( ! $headers = fgetcsv($fh) ) { die('bad data'); }
while($line = fgetcsv($fh)) {
echo var_export($line, true) . "\n";
if( preg_match('/^ACC/', $line[0] ) { echo "record begin\n"; }
}
fclose($fh);
http://php.net/manual/en/function.fgetcsv.php
I need to insert data from a plain text file, explode each line to 2 parts and then insert to the database. I'm doing in this way, But can this programme be optimized for speed ?
the file has around 27000 lines of entry
DB structure [unique key (ext,info)]
ext [varchar]
info [varchar]
code:
$string = file_get_contents('list.txt');
$file_list=explode("\n",$string);
$entry=0;
$db = new mysqli('localhost', 'root', '', 'file_type');
$sql = $db->prepare('INSERT INTO info (ext,info) VALUES(?, ?)');
$j=count($file_list);
for($i=0;$i<$j;$i++)
{
$data=explode(' ',$file_list[$i],2);
$sql->bind_param('ss', $data[0], $data[1]);
$sql->execute();
$entry++;
}
$sql->close();
echo $entry.' entry inserted !<hr>';
If you are sure that file contains unique pairs of ext/info, you can try to disable keys for import:
ALTER TABLE `info` DISABLE KEYS;
And after import:
ALTER TABLE `info` ENABLE KEYS;
This way unique index will be rebuild once for all records, not every time something is inserted.
To increase speed even more you should change format of this file to be CSV compatible and use mysql LOAD DATA to avoid parsing every line in php.
When there are multiple items to be inserted you usually put all data in a CSV file, create a temporary table with columns matching CSV, and then do a LOAD DATA [LOCAL] INFILE, and then move that data into destination table. But as I can see you don't need much additional processing, so you can even treat your input file as a CSV without any additional trouble.
$db->exec('CREATE TEMPORARY TABLE _tmp_info (ext VARCHAR(255), info VARCHAR(255))');
$db->exec("LOAD DATA LOCAL INFILE '{$filename}' INTO TABLE _tmp_info
FIELDS TERMINATED BY ' '
LINES TERMINATED BY '\n'"); // $filename = 'list.txt' in your case
$db->exec('INSERT INTO info (ext, info) SELECT t.ext, t.info FROM _tmp_info t');
You can run a COUNT(*) on temp table after that to show how many records were there.
If you have a large file that you want to read in I would not use file_get_contents. By using it you force the interpreter to store the entire contents in memory all at once, which is a bit wasteful.
The following is a snippet taken from here:
$file_handle = fopen("myfile", "r");
while (!feof($file_handle)) {
$line = fgets($file_handle);
echo $line;
}
fclose($file_handle);
This is different in that all you are keeping in memory from the file at a single instance in time is a single line (not the entire contents of the file), which in your case will probably lower the run-time memory footprint of your script. In your case, you can use the same loop to perform your INSERT operation.
If you can use something like Talend. It's an ETL program, simple and free (it has a paid version).
Here is the magic solution [3 seconds vs 240 seconds]
ALTER TABLE info DISABLE KEYS;
$db->autocommit(FALSE);
//insert
$db->commit();
ALTER TABLE info ENABLE KEYS;
I am using the following script to upload records to my MYSQL database, the problem I can see is if a client record is uploaded and it already exists in the database and is duplicated.
I have seen lots of posts on here about people asking on how to remove duplicates from the csv file itself on upload, e.g if there are two instances of the name bob and the postcode lh456gl in the csv dont upload it, but what I want to know is if its possible to check the database for a record first before adding that record so not to insert a record that already is there.
So something like :
if exist namecolumn=$name_being_inserted and postcode=postcode_being_inserted then
do not add that record.
Is this even possible to do ?.
<?php
//database connect info here
//check for file upload
if(isset($_FILES['csv_file']) && is_uploaded_file($_FILES['csv_file']['tmp_name'])){
//upload directory
$upload_dir = "./csv";
//create file name
$file_path = $upload_dir . $_FILES['csv_file']['name'];
//move uploaded file to upload dir
if (!move_uploaded_file($_FILES['csv_file']['tmp_name'], $file_path)) {
//error moving upload file
echo "Error moving file upload";
}
//open the csv file for reading
$handle = fopen($file_path, 'r');
while (($data = fgetcsv($handle, 1000, ',')) !== FALSE) {
//Access field data in $data array ex.
$name = $data[0];
$postcode = $data[1];
//Use data to insert into db
$sql = sprintf("INSERT INTO test (name, postcode) VALUES ('%s','%s')",
mysql_real_escape_string($name),
mysql_real_escape_string($postcode)
);
mysql_query($sql) or (mysql_query("ROLLBACK") and die(mysql_error() . " - $sql"));
}
//delete csv file
unlink($file_path);
}
?>
There are two pure MySQL methods that I can think of that would deal with this issue. REPLACE INTO and INSERT IGNORE.
REPLACE INTO will overwrite the existing row whereas INSERT IGNORE will ignore errors triggered by duplicate keys being entered in the database.
This is described in the manual as:
If you use the IGNORE keyword, errors that occur while executing the
INSERT statement are treated as warnings instead. For example, without
IGNORE, a row that duplicates an existing UNIQUE index or PRIMARY KEY
value in the table causes a duplicate-key error and the statement is
aborted. With IGNORE, the row still is not inserted, but no error is
issued.
For INSERT IGNORE to work you will need to setup a UNIQUE key/index on one or more of the fields. Looking at your code sample though you do not have anything that could be considered unique in your insert query. What if there are two John Smiths in Wolverhampton? Ideally you would have something like an email address to define as unique.
Simply create a UNIQUE-key over name and postcode, then a row cannot be inserted when a row with both values for that fields already exists.
I would let the records to be inserted in the database and then, after inserting those records, just execute:
ALTER IGNORE TABLE dup_table ADD UNIQUE INDEX(a,b);
where a, and b are your columns where you don't want to have duplicates (key columns...you can have them more). You can wrap all that into transaction. So, just start transaction, insert all records (no matter if they are duplicates), execute command I wrote, commit transaction and then you can remove that (a, b) unique index to prepare it for the next import. Easy.