duplicate data avoiding in codeigniter datamapper !? PHP

duplicate data avoiding in codeigniter datamapper !? PHP - php

I am inserting the Food(table) keys/enteries into the DB and I get duplicate keys even though I am not suppose to based on the manual. I am really confused and stuck!?
Relationship is as follows: every user has a related to many different food types. Then whenever I read the user's $data['food'] agin...It creates a duplicate entry . Meaning that next time the user logs in instead of knowing that food exits. it increments the primary key and does not understand that the key exits contrary to what manual suggest that save is smart enough to know that...So my problem is I want to have only one copy of every entry but I end up with more entries. How Can I avoid having duplicate entries ?
for(i=0; sizof($data['food']);i++){
$f=new Food();
$f->food_id=$food['id'];
$f->name=$food[$j]['name'];
$f->user_id=$food_id;
$u=new User();
$u->where('user_id',$food)->get();
//save food and the relationship
$fm->save($f);
}

Why did you set your user_id to be the same as food_id?
The code you wrote to get the user details, doesn't seem to be doing anything too.

Related

Better approach for updating multiple data

I have this MySQL table, where row contact_id is unique for each user_id.
history:
- hist_id: int(11) auto_increment primary key
- user_id: int(11)
- contact_id: int(11)
- name: varchar(50)
- phone: varchar(30)
From time to time, server will receive a new list of contacts for a specific user_id and need to update this table, inserting, deleting or updating data that is different from previous information.
For example, currenty data is:
So, server receive this data:
And the new data is:
As you can see, first row (John) was updated, second row (Mary) was deleted and some other row (Jeniffer) was included.
Today what I am doing is deleting all rows with a specific user_id, and inserting the new data. But the autoincrement field (hist_id) is getting bigger and bigger...
Obs: Table have about 80 thousand records, and this update will occur 30 times a day or more.
I have some (related) questions:
1. In this scenario, do you think deleting all records from a specific user_id and inserting updated data is a good approach?
2. What about removing the autoincrement field? I don't need it, but I think it is not a good idea to have a table without a primary key.
3. Or maybe the better approach is to loop new data, selecting each user_id / contact_id for comparing values to update?
PS. For better approach I mean the most efficient way
Thank you so much for any help!

In this scenario, do you think deleting all records from a specific user_id and inserting updated data is a good approach?
Short Answer
No. You should be taking advantage of 'upsert' which is short for 'insert on duplicate key update'. What this means is that if they key pair you're inserting already exists, update the specified columns with the specified data. You then shorten your logic and reduce increments. Here's an example, using your table structure that should work. This is also assuming that you have set the user_id and contact_id fields to unique.
INSERT INTO history (user_id, contact_id, name, phone)
VALUES
(1, 23, 'James Jr.', '(619)-543-6222')
ON DUPLICATE KEY UPDATE
name=VALUES(name),
phone=VALUES(phone);
This query should retain the contact_id but overwrite the prexisting data with the new data.
What about removing the autoincrement field? I don't need it, but I think it is not a good idea to have a table without a primary key.
Primary keys do not imply auto incremented values. I could have a varchar field as the primary key containing names of fruits and vegetables. Is this optimized for performance? Probably not. There many situations that might call for auto increment and there are definite reasons to avoid it. It all depends on how you wish to access the data and how this can impact future expansion. In your situation, I would start over on the table structure and re-think how you wish to store and access the data. Do you want to write more logic to control the data OR do you want the data to flow naturally by itself? You've made a history table that is functioning more like a hybrid many-to-one crosswalk at first glance. Without looking at the remaining table structure, I can't necessarily say on a whim that it's not a good idea. What I can say is that I would do this a bit differently. I will answer this more specifically in the next question.
Or maybe the better approach is to loop new data, selecting each user_id / contact_id for comparing values to update?
I would avoid looping through the data in order to update it. That is a job for SQL and it does this job well. Sometimes, we might find ourselves in a situation where we must do this to either extract data in a specific format or to repair data in some way however, avoid doing this for inserting or updating the data. It can negatively impact performance and you will likely paint yourself into a corner.
Back to what I said toward the end of your second question which will help you see what I am talking about. I am going to assume that user_id is a primary key that is auto-incremented in your user table. I will do some guestimation here and show you an example of how you can redesign your user, contact and phone number structure. The following is a quick model I threw together that shows the foreign key relationship between the tables.
Note: The column names and overall data arrangement could be done differently but I did this quickly to give you a decent example of a normalized database structure. All of the foreign keys have a structural layout which separates your data in a way that enables you to control the flow of data as it enters and leaves your system. Here's the screenshot of the database model I threw together using MySQL Workbench.
(source: xonos.net)
Here's the SQL so that you can look at it more closely.
You'll notice that the "person" table is extracted from users but shares data with contacts. This enables you to store all "people" in one place, all "users" in another and all "contacts" in another. Now, why would we do this? The number one reason can be explained in two scenarios.
1.) Say we have someone, in this example I'll call him "Jim Bean". "Jim Bean" works for the company, so he is a user of the system. But, "Jim Bean" happens to own a side business and does contact work for the company at the same time. So, he is both a contact and a user of the system. In a more "flat table" environment, we would have two records for Jim Bean that contain the same data which could become outdated or incorrect, quickly.
2.) Let's say that Jim did some bad things and the company wants nothing to do with him anymore. They don't want any record of him - as if he never existed. All that we have to do is delete Jim Bean from the Person table. That's it. Since the foreign relationship has "CASCADE" on update/delete - this automatically propagate and clears out the other tables related to him.
I highly recommend that you do some reading on normalized data structure. It has saved me many hours once I got the hang of it and I will never go back.

Generating my own Eloquent model insert IDs - How to avoid PK Collisions?

Maybe this is a stupid question because I should defer PK increments to MySql itself, but I'm in a weird situation.
Basically to handle versioning and approvals in my system, I have revision_batch table which is a collection of things in a submission that a user wishes to insert or update to the database. It has columns like batch_id, the user_id of the submitter, and an approved value.
It also is parent to a collection of items in the revisions table. The revisions table has things like table_name, key, old_value, and new_value. I use this to store the changes someone wishes to make that may not be approved automatically.
When someone who doesn't have permission to, say, a "task" table, and they change the name of an task, a new revision_batch will be created, and a new revision will be created with table_name="tasks", key=[whatever the task's ID is], old_value="my old task name", new_value="my new task name".
When an approver approves of this batch, my code will rocket through the revisions in the batch and perform the update or inserts to the database.
My problem is when performing parent-child relationships within the same batch. If I'm creating a new task and want to assign a task_item to it, in the same batch, then I need to know what PK the task is getting so that I can give the task_item a "task_id".
If I'm handing the creation of a new revision for a task, I might do something like a
select max(id)+1 as newId from tasks
to inject as the new id. But since I might already have a pending task insert revision with that ID or higher, I also check
select max(key) + 1 as newId
from revisions
inner join revision_batches on revisions.batch_id = revision_batches.id
where table_name='revisions' and approved = 'P'
for a higher id to assign. That way of I have ids 1-9 in a tasks table and 10-12 pending in the revisions table, any new direct insert using Laravel's Eloquent model class is overridden to check both tasks and revisions and will insert with id 13. This avoids collisions between actual cemented rows and possible revision rows. It also allows me to create a parent and many layers of children within a single batch because I determine their ID as I go along.
This is all works fine.
My problem is that if I have two revisions creations happening at the exact same time (like, within a millisecond) , they'll asynchronously both fetch the same next ID to use, both create revisions where key = the same number, and then only one will get through and the other fails on a PK collision.
My question is: is there a way to force this to be thread safe or to be done synchronously, to avoid two instances of the same controller method executing at the same time and both fetching the same ID to use? Can I lock a method down to a single instance at a time? If not, is there a better way I could be handling PK generation? The only reason I do this is to know beforehand the key to insert. But since custom code in the framework is handling PK generation and not the database, it's causing me this major issue. Happens sporadically, but only when I force the same method to execute maybe 4 times at the same time.
I know that I could avoid the majority of cases where I have many things being inserted at the exact same time, but that doesn't mean that randomly in the future that two users won't hit enter at the same time and recreate this issue.
Any ideas?
Thanks!

For this type of issues I use UUID 4, (Universally unique identifier), my case is a little bit different because I have a system in 74 different locations, but need to extract all the transaction records and integrate in a consolidation system, so my PKs needs to be unique across all servers to avoid collisions.
In laravel I use this excelent package to generate the UUID
I hope this works for you.

Use Queues for saving your revisions.
Queues are synchronous, and hence the key collision will never occur.
Source: http://laravel.com/docs/4.2/queues

Check for existing entries in Database or recreate table?

I've got a PHP script pulling a file from a server and plugging the values in it into a Database every 4 hours.
This file can and most likely change within the 4 hours (or whatever timeframe I finally choose). It's a list of properties and their owners.
Would it be better to check the file and compare it to each DB entry and update any if they need it, or create a temp table and then compare the two using an SQL query?

None.
What I'd personally do is run the INSERT command using ON DUPLICATE KEY UPDATE (assuming your table is properly designed and that you are using at least one piece of information from your file as UNIQUE key which you should based on your comment).
Reasons
Creating temp table is a hassle.
Comparing is a hassle too. You need to select a record, compare a record, if not equal update the record and so on - it's just a giant waste of time to compare a piece of info and there's a better way to do it.
It would be so much easier if you just insert everything you find and if a clash occurs - that means the record exists and most likely needs updating.
That way you took care of everything with 1 query and your data integrity is preserved also so you can just keep filling your table or updating with new records.

I think it would be best to download the file and update the existing table, maybe using REPLACE or REPLACE INTO. "REPLACE works exactly like INSERT, except that if an old row in the table has the same value as a new row for a PRIMARY KEY or a UNIQUE index, the old row is deleted before the new row is inserted." http://dev.mysql.com/doc/refman/5.0/en/replace.html

Presumably you have a list of columns that will have to match in order for you to decide that the two things match.
If you create a UNIQUE index over those columns then you can use either INSERT ... ON DUPLICATE KEY UPDATE(manual) or REPLACE INTO ...(manual)

How can I update multiple tables while guaranteeing no duplicate ids?

I'm used to building websites with user accounts, so I can simply auto-increment the user id, then let them log in while I identify that user by user id internally. What I need to do in this case is a bit different. I need to anonymously collect a few rows of data from people, and tie those rows together so I can easily discern which data rows belong to which user.
The difficulty I'm having is in generating the id to tie the data rows together. My first thought was to poll the database for the highest user ID in existence, and write to the database with user ID +1. This will fail, however, if two submissions poll the database before either of them writes to it - they will each share the same user ID.
Another thought I had was to create a separate user ID table that would be set to auto-increment, and simply generate a new row, then poll that table for the id of the last row created. That also fails for the same reason as above - if two submissions create a row before either of them polls for the latest user ID, then they'll end up sharing an ID.
Any ideas? I get the impression I'm missing something obvious.

I think I'm understanding you right; I was having a similar issue. There's a super handy php function, though. After you query the database to insert a new row and auto-incrementing their user ID, do:
$user_id = mysql_insert_id();
That just returns the auto-increment value from the previous query on the current mysql connection. You can read more about it here if you need to.
You can then use this to populate the second table's data, being sure nobody will get a duplicate ID from the first one.

You need to insert the user, get the auto-generated id, and then use that id as a foreign key in the couple of rows you need to associate with the parent record. The hat rack must exist before you can hang hats on it.

This is a common issue, and to solve it, you would use a transaction. This gives you the atomic idea being being able to do more than one thing, but have it tied to either a success or fail as a package. It's an advanced db feature, and does require awareness of some more advanced programming in order to implement it in as fault-tolerant a manner as possible.

Ids from mysql massive insert from simultaneous sources

I've got an application in php & mysql where the users writes and reads from a particular table. One of the write modes is in a batch, doing only one query with the multiple values. The table has an ID which auto-increments.
The idea is that for each row in the table that is inserted, a copy is inserted in a separate table, as a history log, including the ID that was generated.
The problem is that multiple users can do this at once, and I need to be sure that the ID loaded is the correct.
Can I be sure that if I do for example:
INSERT INTO table1 VALUES ('','test1'),('','test2')
that the ids generated are sequential?
How can I get the Id's that were just loaded, and be sure that those are the ones that were just loaded?
I've thinked of the LOCK TABLE, but the users shouldn't note this.
Hope I made myself clear...

Building an application that requires generated IDs to be sequential usually means you're taking a wrong approach - what happens when you have to delete a value some day, are you going to re-sequence the entire table? Much better to just let the values fall as they may, using a primary key to prevent duplication.

based on the current implementation of myisam and innodb, yes. however, this is not guaranteed to be so in the future, so i would not rely on it.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.