How can I update multiple tables while guaranteeing no duplicate ids?

How can I update multiple tables while guaranteeing no duplicate ids? - php

I'm used to building websites with user accounts, so I can simply auto-increment the user id, then let them log in while I identify that user by user id internally. What I need to do in this case is a bit different. I need to anonymously collect a few rows of data from people, and tie those rows together so I can easily discern which data rows belong to which user.
The difficulty I'm having is in generating the id to tie the data rows together. My first thought was to poll the database for the highest user ID in existence, and write to the database with user ID +1. This will fail, however, if two submissions poll the database before either of them writes to it - they will each share the same user ID.
Another thought I had was to create a separate user ID table that would be set to auto-increment, and simply generate a new row, then poll that table for the id of the last row created. That also fails for the same reason as above - if two submissions create a row before either of them polls for the latest user ID, then they'll end up sharing an ID.
Any ideas? I get the impression I'm missing something obvious.

I think I'm understanding you right; I was having a similar issue. There's a super handy php function, though. After you query the database to insert a new row and auto-incrementing their user ID, do:
$user_id = mysql_insert_id();
That just returns the auto-increment value from the previous query on the current mysql connection. You can read more about it here if you need to.
You can then use this to populate the second table's data, being sure nobody will get a duplicate ID from the first one.

You need to insert the user, get the auto-generated id, and then use that id as a foreign key in the couple of rows you need to associate with the parent record. The hat rack must exist before you can hang hats on it.

This is a common issue, and to solve it, you would use a transaction. This gives you the atomic idea being being able to do more than one thing, but have it tied to either a success or fail as a package. It's an advanced db feature, and does require awareness of some more advanced programming in order to implement it in as fault-tolerant a manner as possible.

Related

Generating my own Eloquent model insert IDs - How to avoid PK Collisions?

Maybe this is a stupid question because I should defer PK increments to MySql itself, but I'm in a weird situation.
Basically to handle versioning and approvals in my system, I have revision_batch table which is a collection of things in a submission that a user wishes to insert or update to the database. It has columns like batch_id, the user_id of the submitter, and an approved value.
It also is parent to a collection of items in the revisions table. The revisions table has things like table_name, key, old_value, and new_value. I use this to store the changes someone wishes to make that may not be approved automatically.
When someone who doesn't have permission to, say, a "task" table, and they change the name of an task, a new revision_batch will be created, and a new revision will be created with table_name="tasks", key=[whatever the task's ID is], old_value="my old task name", new_value="my new task name".
When an approver approves of this batch, my code will rocket through the revisions in the batch and perform the update or inserts to the database.
My problem is when performing parent-child relationships within the same batch. If I'm creating a new task and want to assign a task_item to it, in the same batch, then I need to know what PK the task is getting so that I can give the task_item a "task_id".
If I'm handing the creation of a new revision for a task, I might do something like a
select max(id)+1 as newId from tasks
to inject as the new id. But since I might already have a pending task insert revision with that ID or higher, I also check
select max(key) + 1 as newId
from revisions
inner join revision_batches on revisions.batch_id = revision_batches.id
where table_name='revisions' and approved = 'P'
for a higher id to assign. That way of I have ids 1-9 in a tasks table and 10-12 pending in the revisions table, any new direct insert using Laravel's Eloquent model class is overridden to check both tasks and revisions and will insert with id 13. This avoids collisions between actual cemented rows and possible revision rows. It also allows me to create a parent and many layers of children within a single batch because I determine their ID as I go along.
This is all works fine.
My problem is that if I have two revisions creations happening at the exact same time (like, within a millisecond) , they'll asynchronously both fetch the same next ID to use, both create revisions where key = the same number, and then only one will get through and the other fails on a PK collision.
My question is: is there a way to force this to be thread safe or to be done synchronously, to avoid two instances of the same controller method executing at the same time and both fetching the same ID to use? Can I lock a method down to a single instance at a time? If not, is there a better way I could be handling PK generation? The only reason I do this is to know beforehand the key to insert. But since custom code in the framework is handling PK generation and not the database, it's causing me this major issue. Happens sporadically, but only when I force the same method to execute maybe 4 times at the same time.
I know that I could avoid the majority of cases where I have many things being inserted at the exact same time, but that doesn't mean that randomly in the future that two users won't hit enter at the same time and recreate this issue.
Any ideas?
Thanks!

For this type of issues I use UUID 4, (Universally unique identifier), my case is a little bit different because I have a system in 74 different locations, but need to extract all the transaction records and integrate in a consolidation system, so my PKs needs to be unique across all servers to avoid collisions.
In laravel I use this excelent package to generate the UUID
I hope this works for you.

Use Queues for saving your revisions.
Queues are synchronous, and hence the key collision will never occur.
Source: http://laravel.com/docs/4.2/queues

Besy way to store data about data that is already stored in a database?

I am working on creating a favorites section on my website where users can simply store certain items in their favorites section for easy access. Each of the items are already well-defined and have multiple attributes. So my question is lets say I had 10,000 users and I would like to implement a 'favorites' system, what would be the best way to keep track of what favorite items have been added by each user?
I was thinking implementing this the following way: link each favorited item id to a username and then run a query for if the user with a particular username is logged in than retrieve all the favorited items by that username.
I appreciate any help with figuring out of a good way to do this. My goal is to store in a way that is later easy to retrieve and use the data and minimize redundant information.

It's pretty easy, you need to create a new table with 3 fields:
id
favoriteID
userID
Every time a user adds a new favourite, it adds a new record to this table, storing both the ID of the favorite, and the ID of the user. There is no redundant information and it's easy to retrieve the details of either the favorite or the user by implementing a join query. This is what relational databases are for.

Within an RDBMS you would probably have a many to many table with the user id and article id. You do not need an independent id column:
create table favourites (user_id int, article_id int);
These of course reference your user table and articles table. (Or whatever you have in place of articles.)
You would then need to retrieve all rows for a single user when wanting to show that user's favourites. You might also want to make a combined UNIQUE index on the columns to prevent duplicates.
You may have faster response with something like cassandra where you can simply retrieve based on the key of the user_id and get all their favourites in one easy spot. But then you're dealing with mutilple systems.
I've heard, but haven't had a chance to look into, that MySQL can now support a Key-Value system similar to Cassandra and that may be your best bet.

Assigning keys from rows of one table to rows to another concurrency concerns

In the users table I am adding new rows, these rows need to each contain a reference the id of a unique row in another "access keys" table.
Some of these users rows may be added simultaneously from many threads. I know these will be database blocking with the writes so not truly simultaneously. So maybe this is not a concern?
So I have a php script that creates the new user row, and I have the access keys table populated with many rows. How do I generate the reference to the id of a unique row in the "access keys" table and how do I know it is unique?

"What I mean is how do I get the next "access keys" id that is not currently assigned to any users row. How do I keep track of which have not yet been used in order to know which one to write in the users table."
If you want this to work with an absolute guarantee of things never going wrong, then all your "add user" transactions should be fully serialized. I.e. they cannot possibly execute concurrently.
It is impossible to get a definite answer to the question "which access keys are currently not in use" while any transaction is going on that might be in the very process of changing the answer to that very question.
Instead of retrieving prepared access keys from a pool, can you generate access keys by hashing userid or user names or something like that ? If so, I'd do that.
If not, you'll have to either accept the full serialization (i.e. transaction delay and possibly the occasional transaction timeout), or else you'll have to accept the fact that things might go wrong.
Note that "accepting full serialization" need not be problematic. If you keep your transactions short-lived (i.e. you commit them fast), then chances are you won't even notice the serialization.

You do it sequentially from whatever thread(s) are doing the insertion:
1) add row to the users table
2) retrieve ID of this new row
3) insert into access keys table using the ID retrieved in #2
4) go to 1) until completed
MySQL can securely return the last ID it created as part of an insert query for each connection - an insert done by some OTHER thread will not overwrite another thread's "last insert id" - the last insert id kept on a per-connection basis.
With this structure, you can have as many threads as you want doing inserts, and none of them will step on each other's toes, as they're all getting their own distinct "last id".

What is an elegant / efficient way of storing the status of 100 lessons for multiple users?

I'm working on an app in JavaScipt, jQuery, PHP & MySQL that consists of ~100 lessons. I am trying to think of an efficient way to store the status of each user's progress through the lessons, without having to query the MySQL database too much.
Right now, I am thinking the easiest implementation is to create a table for each user, and then store each lesson's status in that table. The only problem with that is if I add new lessons, I would have to update every user's table.
The second implementation I considered would be to store each lesson as a table, and record the user ID for each user that completed that lesson there - but then generating a status report (what lessons a user completed, how well they did, etc.) would mean pulling data from 100 tables.
Is there an obvious solution I am missing? How would you store your users progress through 100 lessons, so it's quick and simple to generate a status report showing their process.
Cheers!

The table structure I would recommend would be to keep a single table with non-unique fields userid and lessonid, as well as the relevant progress fields. When you want the progress of user x on lesson y, you would do this:
SELECT * FROM lessonProgress WHERE userid=x AND lessonid=y LIMIT 1;
You don't need to worry about performance unless you see that it's actually an issue. Having a table for each user or a table for each lesson are bad solutions because there aren't meant to be a dynamic number of tables in a database.

If reporting is restricted to one user at a time - that is, when generating a report, it's for a specific user and not a large clump of users - why not consider javascript object notation stored in a file? If extensibility is key, it would make it a simple matter.
Obviously, if you're going to run reports against an arbitrarily large number of users at once, separate data files would become inefficient.
Discarding the efficiency argument, json would also give you a very human-readable and interchangeable format.
Lastly, if the security of the report output isn't a big sticking point, you'd also gain the ability to easily offload view rendering onto the client.

Use relations between 2 tables. One for users with user specific columns like ID, username, email, w/e else you want to store about them.
Then a status table that has a UID foreign key. ID UID Status etc.
It's good to keep datecreated and dateupdated on tables as well.
Then just join the tables ON status.UID = users.ID

A good option will be to create one table with an user_ID as primary key and a status (int) each row of the table will represent a user. Accessing to its progress would be fast a simple since you have an index of user IDs.
In this way, adding new leassons would not make you change de DB

Ids from mysql massive insert from simultaneous sources

I've got an application in php & mysql where the users writes and reads from a particular table. One of the write modes is in a batch, doing only one query with the multiple values. The table has an ID which auto-increments.
The idea is that for each row in the table that is inserted, a copy is inserted in a separate table, as a history log, including the ID that was generated.
The problem is that multiple users can do this at once, and I need to be sure that the ID loaded is the correct.
Can I be sure that if I do for example:
INSERT INTO table1 VALUES ('','test1'),('','test2')
that the ids generated are sequential?
How can I get the Id's that were just loaded, and be sure that those are the ones that were just loaded?
I've thinked of the LOCK TABLE, but the users shouldn't note this.
Hope I made myself clear...

Building an application that requires generated IDs to be sequential usually means you're taking a wrong approach - what happens when you have to delete a value some day, are you going to re-sequence the entire table? Much better to just let the values fall as they may, using a primary key to prevent duplication.

based on the current implementation of myisam and innodb, yes. however, this is not guaranteed to be so in the future, so i would not rely on it.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.