CSV and ID problems

CSV and ID problems - php

I have a database with employees in it.
Since my employer finds it easy to input the data in a CSV file, I wrote a program that truncates my database and inserts the CSV data in my DB.
Employee: [ID, LAST_NAME, NAME, EMAIL, REMARKS, ...]
I use the field ID, (which is an auto_increment value) to make all my employee's unique. This works fine, however recently my employer has asked me too to include a functionality to mark favorites.
The only thing which makes my employees unique is the ID key thus when I update
the new CSV file the ID's go all broke and are shifted since I had to truncate my database and the favorites don't match up any more.
An example of what I mean (CSV file):
0, Carlton, John, john#gmail.com, "Great worker",
1, Awsome, Dude, awsomeDud#aol.com, "Not so great",
2, Random, Randy, rr#hotmail.com, "idk"
Suppose somebody deletes the record with ID 1.
And my favorite was 1, the csv file however will now look like this:
0, Carlton, John, john#gmail.com, "Great worker",
1, Random, Randy, rr#hotmail.com, "idk"
It points to the wrong person.
Keep in mind that the ID's I wrote are not part of the csv file itself
they are the auto_increment value.
I have given this problem a lot of thought and I cannot seem to find a simple way to accomplish this.
Any help would be appreciated.
Notes:
Emails are not unique, nor required.
The only real unique field is the ID field.

Solution 1 (easiest)
Have an int is_favorite column in your database containing 1 or 0, with a default value of 0 (meaning is not a favorite). Then ask your client to slightly change the format of the csv file as follows:
Employee: [ID, LAST_NAME, NAME, EMAIL, REMARKS, FAVORITE, ...]
Example CSV:
0, Carlton, John, john#gmail.com, "Great worker", 1
1, Awsome, Dude, awsomeDud#aol.com, "Not so great", 0
2, Random, Randy, rr#hotmail.com, "idk"
When you process the CSV file, depending on the FAVORITE column just set the same value in the database. This will eliminate the problem with the mismatched favorites. Unfortunately, if in the near feature, the client requires new features which depend on the favorites, you might have the same issue again.
Solution 2 (best)
Discuss a more mature solution with your client pointing out the current CSV solution is no longer a valid option due to the issue with matching the CSV users with the appropriate sub features (i.e. favorites)

A possible solution would be to never truncate your table. Ever.
Find out what makes the employees unique. E.g. EMAIL.
Then when you parse the next CSV's, you don't simply INSERT the employees. You update the current ones and insert the new ones.
This way, your IDs always stay the same (which they should).
I would have used something like this:
IF EXISTS (SELECT 1 FROM [User] WHERE [Email] = #UsersEmail)
BEGIN
UPDATE [User]
SET [Name] = #NewName
WHERE [Email] = #UsersEmail
END
ELSE
BEGIN
INSERT INTO [User] ([Email], [Name]) VALUES
(#UsersEmail, #NewName)
END
But since you've tagged it PHP, I'm guessing you're using MySQL. Which can do it differently (from here):
INSERT INTO subs
(subs_name, subs_email, subs_birthday)
VALUES
(?, ?, ?)
ON DUPLICATE KEY UPDATE
subs_name = VALUES(subs_name),
subs_birthday = VALUES(subs_birthday)

I would not truncate the table. I would then upload the csv into a temporary table. If the same ID is in both tables, do an update. If it is only in the old version, delete it (deleting out favorites as well for that ID), or, perhaps better, have a flag on the employees table that deactivates the row. If it is only in the new version, insert everything except the ID (which will probably be an empty string anyway). Then you can delete the temporary table.
If you want to be paranoid, you can double check names or emails and if you find a mismatch, flag them without updating. That would cause a manual operation if someone changed their name, but it would also save you the trouble if someone messed up your id numbers.

The simple and clean way to solve this would be to find a way to recognise unique employees on the flat data.
Is there no other unique identifier that could be added to the csv file? For example, a windows login name? A company employee No? Something that would be static.
That way it's simple:
1, Don't truncate.
2, If Windows LoginID / EmpNo exists, update.
3, If not, add.
Also I'm concerned that your "favourites" table is clearly not using referential integrity. It should have a FK pointing to your Employee.ID; preventing you accidentally deleting an employee that was marked as a favourite, amongst other things.
A messier, much less bullet proof way, would be to mark your favourites based on your employee names rather than IDs. There are obvious draw backs to this approach, so use as last resort.

You should never use the ID to identify a given user for the reasons you described in the question.
You could create a new reference ID field based on what you already have and create a unique identifier by chaining the required fields as a single string and then calculating the MD5 hash for example.
I have a question (sorry but I can't comment - rep): your employer adds only new employees via CSV file or even edit existing ones?
If only new employees are added you don't need to rebuild the table from scratch and you can make sure that your program generates a unique reference ID (that will remain unchanged) before inserting data into the db. Also your program can handle the editing of the employee, instead of changing data from CSV, leaving reference ID untouched.
This way all the fields like name, email, etc. can be edited and the link to favourites will stay correct. In that case the reference ID can also be calculated using not only data on the CSV but other like creation timestamp.

You could create MD5 hash from name , email and the comment , save and use that as unique identifier .
Make sure you store MD5 hash as binary

Can you modify database? If you can, add another field that you can call favourite. Set it to simple enum (1,0) and set 1 for favourites, 0 for others. So when, you truncate database, you'll still have your favourites by those fields. Of course if you have multi-level favourites, don't set field to enum, set it to something else, more suitable for you.

One solution is that the database becomes the defacto 'source' for IDs.
After the initial import, the next time your boss wants to update the file, create a CSV FROM the database (with the ID's intact) and ask your boss to update that and return it.
You could ask him to add new rows to the bottom of the file and leave out the ID.
Any row in the new spreadsheet without an ID is a new record. An extra field at the end of the row could be used by the boss to indicate rows to be deleted.
Repeat this process the next time the boss wants to update the file.

Add an extra field to your database table as well as to the CSV file named something like "EmployeeID" which should be unique for all employees.

Related

Is possible to add a record in between the others in MySQL?

I have a database in mysql, and a table called Animals, I use this condition to add news records.
public function create()
{
$animals = Animals::all();
$last_animal_id = collect($animals)->last();
if ($last_animal_id->id == $last_animal_id->id) {
$last_animal_id->id = $last_animal_id->id + 1;
} else {
return false;
}
return view('animal.create-animals')->with('last_animal_id', $last_animal_id);
}
I work in laravel and php, and that is my controller 'AnimalsController', the condition add +1 to the last id that is registered in the table.
For example, I have 4 records and I delete the last record, without my condition, after I have added a new record the new record will take the value 6.
And that is the reason that I add manually new records, with this condition, the condition find the last id, and add +1 to the last id, not +2 if I not have this condition. Not directly, I pass the value to an input and then I send the form in my view.
Is possible to add +1 id in the table, if I delete a record in the middle, or before the last record? As the following example explains:
Table Animals
/*NOTE: The field 'id' HAVE THE FOLLOWING ATTRIBUTES:
AUTO_INCREMENT, IS 'NOT NULL','PRIMARY KEY', AND HIS TYPE IS 'INT'*/
id|name |class
1 |Dog |Mammal
2 |Cat |Mammal
3 |Sparrow|Bird
4 |Whale |Mammal
5 |Frog |Amphibian
6 |Snake |Reptile
Then I delete the id, 2, and 3.
In addition to the condition that already exists, I would like to create another condition that allows to add new records among the others, only if there are missing records in between of others.
Using the previous example:
I said that I will delete the id 2 and 3 right? The new condition must allow to create again the records with the id 2 and 3 between the records with the id 1 and 4.
If I delete another record the condition must perform the same function. Certainly replacing the records with corresponding id that were previously deleted.
For more details: I use a form to create new animals to the table Animals, previously I said in the example, that I will delete the records with id 2 and 3, then If the condition in my controller, and my form in my view, work properly then I can add again the animal with id 2, and then in a new form add again the animal with id 3.
Thus, if my question was not understood very well or you thought that my function should add the record(s) simultaneously, you understood it wrong, because It's that not the function that I would like to do in the function.

One thing to keep in mind when working with relational databases is that the id column is usually used to relate this data and as such it can and will appear in other tables. If you arbitrarily renumber things here, you're damaging those links and potentially scrambling up your data.
If ordering is important, create a column for that purpose, for example one called position or something similar. This one you can manipulate freely without concern about altering relations.
Generally your id value should be:
Always populated (e.g. NOT NULL)
Integer (e.g. INT or BIGINT)
Set as your primary key (e.g. PRIMARY KEY)
Generated automatically (e.g. AUTO_INCREMENT)
Never changed, it's permanently assigned
Never recycled and used for another record
Recycling id values is how you create enormous security problems. It's all too easy for a user to "inherit" all the data that came with an old user ID value you've recycled. The safest thing is to never, ever re-use these values.
They're just IDs. Forget about holes or lack of ordering. Any production database will end up with lots of interesting patterns there that are unavoidable, but it doesn't matter.
One exception to this is when creating seed databases. Here you can fuss over the ordering to get things arranged as you want because this is before you insert the data into the database.
At the end of the day you'll want to ensure that:
These numbers don't overflow (e.g. INT keyed table at 2.1 billion)
These numbers aren't exposed to users in a way that makes it possible to enumerate your table (e.g. ID value in a URL)
Just think of them as internal identifiers, like a serial number, and you'll be fine. In fact, MySQL now supports SERIAL as a datatype for this reason, that's an alias for BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE which is a good default for systems designed in 2018.

There is a really great answer from Tadman about the implications of your solution.
To give you an alternative to your own solution, you can do something like this....
First, create an order column, an int.
Them, instead of looking at the latest id value, do this...
$highestOrder = Animal::max('order');
And then 'up it'... :-) Just an idea.
BTW: to give you more options, you can look directly in a table as well:
DB::table('animals')->max('order');
... but I would not do that in this case. The model class is the best 'gateway' to this information, not the DB facade directly.

Better approach for updating multiple data

I have this MySQL table, where row contact_id is unique for each user_id.
history:
- hist_id: int(11) auto_increment primary key
- user_id: int(11)
- contact_id: int(11)
- name: varchar(50)
- phone: varchar(30)
From time to time, server will receive a new list of contacts for a specific user_id and need to update this table, inserting, deleting or updating data that is different from previous information.
For example, currenty data is:
So, server receive this data:
And the new data is:
As you can see, first row (John) was updated, second row (Mary) was deleted and some other row (Jeniffer) was included.
Today what I am doing is deleting all rows with a specific user_id, and inserting the new data. But the autoincrement field (hist_id) is getting bigger and bigger...
Obs: Table have about 80 thousand records, and this update will occur 30 times a day or more.
I have some (related) questions:
1. In this scenario, do you think deleting all records from a specific user_id and inserting updated data is a good approach?
2. What about removing the autoincrement field? I don't need it, but I think it is not a good idea to have a table without a primary key.
3. Or maybe the better approach is to loop new data, selecting each user_id / contact_id for comparing values to update?
PS. For better approach I mean the most efficient way
Thank you so much for any help!

In this scenario, do you think deleting all records from a specific user_id and inserting updated data is a good approach?
Short Answer
No. You should be taking advantage of 'upsert' which is short for 'insert on duplicate key update'. What this means is that if they key pair you're inserting already exists, update the specified columns with the specified data. You then shorten your logic and reduce increments. Here's an example, using your table structure that should work. This is also assuming that you have set the user_id and contact_id fields to unique.
INSERT INTO history (user_id, contact_id, name, phone)
VALUES
(1, 23, 'James Jr.', '(619)-543-6222')
ON DUPLICATE KEY UPDATE
name=VALUES(name),
phone=VALUES(phone);
This query should retain the contact_id but overwrite the prexisting data with the new data.
What about removing the autoincrement field? I don't need it, but I think it is not a good idea to have a table without a primary key.
Primary keys do not imply auto incremented values. I could have a varchar field as the primary key containing names of fruits and vegetables. Is this optimized for performance? Probably not. There many situations that might call for auto increment and there are definite reasons to avoid it. It all depends on how you wish to access the data and how this can impact future expansion. In your situation, I would start over on the table structure and re-think how you wish to store and access the data. Do you want to write more logic to control the data OR do you want the data to flow naturally by itself? You've made a history table that is functioning more like a hybrid many-to-one crosswalk at first glance. Without looking at the remaining table structure, I can't necessarily say on a whim that it's not a good idea. What I can say is that I would do this a bit differently. I will answer this more specifically in the next question.
Or maybe the better approach is to loop new data, selecting each user_id / contact_id for comparing values to update?
I would avoid looping through the data in order to update it. That is a job for SQL and it does this job well. Sometimes, we might find ourselves in a situation where we must do this to either extract data in a specific format or to repair data in some way however, avoid doing this for inserting or updating the data. It can negatively impact performance and you will likely paint yourself into a corner.
Back to what I said toward the end of your second question which will help you see what I am talking about. I am going to assume that user_id is a primary key that is auto-incremented in your user table. I will do some guestimation here and show you an example of how you can redesign your user, contact and phone number structure. The following is a quick model I threw together that shows the foreign key relationship between the tables.
Note: The column names and overall data arrangement could be done differently but I did this quickly to give you a decent example of a normalized database structure. All of the foreign keys have a structural layout which separates your data in a way that enables you to control the flow of data as it enters and leaves your system. Here's the screenshot of the database model I threw together using MySQL Workbench.
(source: xonos.net)
Here's the SQL so that you can look at it more closely.
You'll notice that the "person" table is extracted from users but shares data with contacts. This enables you to store all "people" in one place, all "users" in another and all "contacts" in another. Now, why would we do this? The number one reason can be explained in two scenarios.
1.) Say we have someone, in this example I'll call him "Jim Bean". "Jim Bean" works for the company, so he is a user of the system. But, "Jim Bean" happens to own a side business and does contact work for the company at the same time. So, he is both a contact and a user of the system. In a more "flat table" environment, we would have two records for Jim Bean that contain the same data which could become outdated or incorrect, quickly.
2.) Let's say that Jim did some bad things and the company wants nothing to do with him anymore. They don't want any record of him - as if he never existed. All that we have to do is delete Jim Bean from the Person table. That's it. Since the foreign relationship has "CASCADE" on update/delete - this automatically propagate and clears out the other tables related to him.
I highly recommend that you do some reading on normalized data structure. It has saved me many hours once I got the hang of it and I will never go back.

Find the size of an SQL column with PHP?

I'm trying to build a very simple login system for my site (just for practice for a project i'm working on). The way I've decided to implement it is use a table with fields for ID, Name, Password, and username and search for the entered information in the existing table.
For registration, it simply injects the information supplied into the table, and I would like to assign a customer ID number. My idea for assigning an ID number is to simply find the size of the ID column (which will contain the ID's 1,2,3..etc up to the end) and assign the new registration to the length +1. For this purpose i'll need a way to get the size of the column, but I'm just learning php and sql so i'm not sure what the syntax would be.
TLDR; is there a funtion in sql that I can use in php to get the length of a particular column? (i.e the number of entries stored in that column?)

Set the ID column to Primary and Auto increment.
you don't include that in your query it is created on its own.

You'd probably be better off just using an IDENTITY or AUTO_INCREMENT column. The problem with checking for the "size of the column" (by which I assume you mean the count of rows in that column) is that you could end up inserting duplicate IDs, for example:
ID | ...
---------
1
2
4
So if you did a SELECT COUNT(ID)+1 FROM MyTable, it would return 4, and you have an ID collision.
You could do something like SELECT MAX(ID)+1 FROM MyTable, but even then there could be concurrency problems (process A and process B both try to run that query at the same time, before either has a chance to insert the new ID of 5). You're really best off just letting your RDBMS take care of it..

How can I correct MySQL table id's?

I have a table called "posts" and it contain 500 posts but the ids are not sequence
like:
1
3
9
22
446
....
etc.
That's because I deleted some of the posts from the table.
So how can I re-correct the ids?

Primary Key IDs are not supposed to be changed, especially when they are referenced in other tables.
If you need a property that is like a row number, you can add another field for that.
For example invoices are numbered, but the invoice number should not be the primary key, since you want the freedom to re-number one of them without losing other connected information, such as invoice details in other tables.

The easiest way to fix it is to create a quick script to loop through the table and update that the id column and then run on your database: ALTER TABLE tbl AUTO_INCREMENT = 100;

NEVER EVER CHANGE THE ID!
Id is something the record borns with and dies with. That's why it's called id, it is an IDENTITY!
As in real life you cannot change the identity of things, you won't do it in database.
It is a very bad idea from the philosophic perspective, which also results in practical problems. Even if you would renumber the ID in all your tables in your database, the old IDs might still survive somewhere (and make a big mess then):
in URLs all over the internet
in your logs
in your backups
in other database copies.
Also, ID must serve only for identification and nothing else. For example: you use IDs to define order of some dictionary, which you normally present sorted. Then you need to add a new item, which must be presented between items with id 20 and 21. The BAD solution would be to change ID for records with ID >= 21. The GOOD solution is to add a new column Order, which defines the order of items and can be changed whenever needed.
Remember:
ID must serve only for identification and nothing else!
NEVER CHANGE THE ID!

How can we re-use the deleted id from any MySQL-DB table?

How can we re-use the deleted id from any MySQL-DB table?
If I want to rollback the deleted ID , can we do it anyhow?

It may be possible by finding the lowest unused ID and forcing it, but it's terribly bad practice, mainly because of referential integrity: It could be, for example, that relationships from other tables point to a deleted record, which would not be recognizable as "deleted" any more if IDs were reused.
Bottom line: Don't do it. It's a really bad idea.
Related reading: Using auto_increment in the mySQL manual
Re your update: Even if you have a legitimate reason to do this, I don't think there is an automatic way to re-use values in an auto_increment field. If at all, you would have to find the lowest unused value (maybe using a stored procedure or an external script) and force that as the ID (if that's even possible.).

You shouldn't do it.
Don't think of it as a number at all.
It is not a number. It's unique identifier. Think of this word - unique. No record should be identified with the same id.

1.
As per your explanation provided "#Pekka, I am tracking the INsert Update and delete query..." I assume you just some how want to put your old data back to the same ID.
In that case you may consider using a delete-flag column in your table.
If the delete-flag is set for some row, you shall consider program to consider it deleted. Further you may make it available by setting the delete-flat(false).
Similar way is to move whole row to some temporary table and you can bring it back when required with the same data and ID.
Prev. idea is better though.
2.
If this is not what you meant by your explanation; and you want to delete and still use all the values of ID(auto-generated); i have a few ideas you may implement:
- Create a table (IDSTORE) for storing Deleted IDs.
- Create a trigger activated on row delete which will note the ID and store it to the table.
- While inserting take minimum ID from IDSTORE and insert it with that value. If IDSTORE is empty you can pass NULL ID to generate Auto Incremented number.
Of course if you have references / relations (FK) implemented, you manually have to look after it, as your requirement is so.
Further Read:
http://www.databasejournal.com/features/mysql/article.php/10897_2201621_3/Deleting-Duplicate-Rows-in-a-MySQL-Database.htm

Here is the my case for mysql DB:
I had menu table and the menu id was being used in content table as a foreign key. But there was no direct relation between tables (bad table design, i know but the project was done by other developer and later my client approached me to handle it). So, one day my client realised that some of the contents are not showing up. I looked at the problem and found that one of the menu is deleted from menu table, but luckily the menu id exist in cotent table. I found the menu id from content table that was deleted and run the normal insert query for menu table with same menu id along with other fields. (Id is primary key) and it worked.
insert into tbl_menu(id, col1, col2, ...) values(12, val1, val2, ...)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.