Avoiding collisions on primary keys from separate MySQL databases - php

I have several servers running their own instance of a particular MySQL database which unfortunately cannot be setup in replication/cluster. Each server inserts data into several user-related tables which have foreign key constraints between them (e.g. user, user_vote). Here is how the process goes about:
all the servers start with the same data
each server grows its own set of data indepedently from the other servers
periodically, the data from all the servers is merged manually together and applied back to each server (the process therefore repeats itself from step 1).
This is made possible because in addition to its primary key, the user table contains a unique email field which allows identifying which users are already existing in each database, and merging those who are new while changing the primary and foreign keys to avoid collisions and maintain the correct foreign key constraints. It works, but it's quite some effort because primary and foreign keys have to be changed to avoid collision, hence my question:
Is there a way to have each server use primary keys that don't collide with other servers to facilitate the merging?
I initially wanted to use a composite primary key (e.g. server_id, id) but I am using Doctrine which doesn't support primary keys composed of multiple foreign keys so I would have problems with my foreign key constraints.
I thought about using a VARCHAR as an id and using part of the string as a prefix (SERVER1-1,SERVER1-2, SERVER2-1, SERVER2-2...) but I'm thinking it will make the DB slower as I will have to do some manipulations with the ids (e.g. on insert, I have to parse existing ids and extract highest, increment it, concatenate it with server id...).
PS: Another option would be to implement replication with read from slaves and write to master but this option was discarded because of issues such as replication lag and single point of failure on the master which can't be solved for now.

You can make sure each server uses a different incrementation of autoincrement, and a different start offset:
Change the step auto_increment fields increment by
(assuming you are using auoincrements)
I've only ever used this across two servers, so my set-up had one with even ids and one with odd.
When they are merged back together nothing will collide, as long as you make sure all tables follow the above idea.
in order to implement for 4 servers
You would say, set-up the following offsets:
Server 1 = 1
Server 2 = 2
Server 3 = 3
Server 4 = 4
You would set your incrementation as such (I've used 10 to leave space for extra servers):
Server 1 = 10
Server 2 = 10
Server 3 = 10
Server 4 = 10
And then after you have merged, before copying back to each server, you would just need to update the autoinc value for each table to have the correct offset again. Imagine each server had created 100 rows, autoincs would be:
Server 1 = 1001
Server 2 = 1002
Server 3 = 1003
Server 4 = 1004
This is where it does get tricky due to having four servers. For imagine certain tables may not have had any rows inserted from a particular server. So you could end up with some tables having their last autoinc id not being from server 4, but from being from server 2 instead. This would make it very tricky to work out what the next autoinc should be for any particular table.
For this reason it is probably best to also include a column in each of your tables that records the server number when any rows are inserted.
id | field1 | field2 | ... | server
That way you can easily find out what the last autoinc value should be for a particular server by selecting the following on any of your tables:
SELECT MAX(id) FROM `table` WHERE `server`=4 LIMIT 0,1
Using this value you can reset the next autoinc value you need for each table on each server, before rolling the merged dataset out to the server in question.
UPDATE information_schema.tables SET Auto_increment = (
SELECT MAX(id) FROM `table` WHERE `server`=s LIMIT 0,1
)+n WHERE table_name='table' AND table_schema = DATABASE();
Where s is the server number and n is set to the offset, so in my example it would be 10.

Prefixing ID ould do the trick. As for DB being slower - depends how big traffic is served there. You can also have "prefixed id" splitted into two columns, "prefix" and "id" and these can be of any type. Would require some logic to cope with it in requests, but may be worth evaluating

Related

SQL - auto increment withing group inside one table [duplicate]

I have got a table which has an id (primary key with auto increment), uid (key refering to users id for example) and something else which for my question won’t matter.
I want to make, lets call it, different auto-increment keys on id for each uid entry.
So, I will add an entry with uid 10, and the id field for this entry will have a 1 because there were no previous entries with a value of 10 in uid. I will add a new one with uid 4 and its id will be 3 because I there were already two entried with uid 4.
...Very obvious explanation, but I am trying to be as explainative an clear as I can to demonstrate the idea... clearly.
What SQL engine can provide such a functionality natively? (non Microsoft/Oracle based)
If there is none, how could I best replicate it? Triggers perhaps?
Does this functionality have a more suitable name?
In case you know about a non SQL database engine providing such a functioality, name it anyway, I am curious.
Thanks.
MySQL's MyISAM engine can do this. See their manual, in section Using AUTO_INCREMENT:
For MyISAM tables you can specify AUTO_INCREMENT on a secondary column in a multiple-column index. In this case, the generated value for the AUTO_INCREMENT column is calculated as MAX(auto_increment_column) + 1 WHERE prefix=given-prefix. This is useful when you want to put data into ordered groups.
The docs go on after that paragraph, showing an example.
The InnoDB engine in MySQL does not support this feature, which is unfortunate because it's better to use InnoDB in almost all cases.
You can't emulate this behavior using triggers (or any SQL statements limited to transaction scope) without locking tables on INSERT. Consider this sequence of actions:
Mario starts transaction and inserts a new row for user 4.
Bill starts transaction and inserts a new row for user 4.
Mario's session fires a trigger to computes MAX(id)+1 for user 4. You get 3.
Bill's session fires a trigger to compute MAX(id). I get 3.
Bill's session finishes his INSERT and commits.
Mario's session tries to finish his INSERT, but the row with (userid=4, id=3) now exists, so Mario gets a primary key conflict.
In general, you can't control the order of execution of these steps without some kind of synchronization.
The solutions to this are either:
Get an exclusive table lock. Before trying an INSERT, lock the table. This is necessary to prevent concurrent INSERTs from creating a race condition like in the example above. It's necessary to lock the whole table, since you're trying to restrict INSERT there's no specific row to lock (if you were trying to govern access to a given row with UPDATE, you could lock just the specific row). But locking the table causes access to the table to become serial, which limits your throughput.
Do it outside transaction scope. Generate the id number in a way that won't be hidden from two concurrent transactions. By the way, this is what AUTO_INCREMENT does. Two concurrent sessions will each get a unique id value, regardless of their order of execution or order of commit. But tracking the last generated id per userid requires access to the database, or a duplicate data store. For example, a memcached key per userid, which can be incremented atomically.
It's relatively easy to ensure that inserts get unique values. But it's hard to ensure they will get consecutive ordinal values. Also consider:
What happens if you INSERT in a transaction but then roll back? You've allocated id value 3 in that transaction, and then I allocated value 4, so if you roll back and I commit, now there's a gap.
What happens if an INSERT fails because of other constraints on the table (e.g. another column is NOT NULL)? You could get gaps this way too.
If you ever DELETE a row, do you need to renumber all the following rows for the same userid? What does that do to your memcached entries if you use that solution?
SQL Server should allow you to do this. If you can't implement this using a computed column (probably not - there are some restrictions), surely you can implement it in a trigger.
MySQL also would allow you to implement this via triggers.
In a comment you ask the question about efficiency. Unless you are dealing with extreme volumes, storing an 8 byte DATETIME isn't much of an overhead compared to using, for example, a 4 byte INT.
It also massively simplifies your data inserts, as well as being able to cope with records being deleted without creating 'holes' in your sequence.
If you DO need this, be careful with the field names. If you have uid and id in a table, I'd expect id to be unique in that table, and uid to refer to something else. Perhaps, instead, use the field names property_id and amendment_id.
In terms of implementation, there are generally two options.
1). A trigger
Implementations vary, but the logic remains the same. As you don't specify an RDBMS (other than NOT MS/Oracle) the general logic is simple...
Start a transaction (often this is Implicitly already started inside triggers)
Find the MAX(amendment_id) for the property_id being inserted
Update the newly inserted value with MAX(amendment_id) + 1
Commit the transaction
Things to be aware of are...
- multiple records being inserted at the same time
- records being inserted with amendment_id being already populated
- updates altering existing records
2). A Stored Procedure
If you use a stored procedure to control writes to the table, you gain a lot more control.
Implicitly, you know you're only dealing with one record.
You simply don't provide a parameter for DEFAULT fields.
You know what updates / deletes can and can't happen.
You can implement all the business logic you like without hidden triggers
I personally recommend the Stored Procedure route, but triggers do work.
It is important to get your data types right.
What you are describing is a multi-part key. So use a multi-part key. Don't try to encode everything into a magic integer, you will poison the rest of your code.
If a record is identified by (entity_id,version_number) then embrace that description and use it directly instead of mangling the meaning of your keys. You will have to write queries which constrain the version number but that's OK. Databases are good at this sort of thing.
version_number could be a timestamp, as a_horse_with_no_name suggests. This is quite a good idea. There is no meaningful performance disadvantage to using timestamps instead of plain integers. What you gain is meaning, which is more important.
You could maintain a "latest version" table which contains, for each entity_id, only the record with the most-recent version_number. This will be more work for you, so only do it if you really need the performance.

How to prevent two user get same unique key?

I am creating a system to generate unique keys. It works for now. But, I haven't tested it with many users. Users may click a button and then get his unique number, as simple as that.
But, How to prevent multiple users getting the same unique keys,if they press the button exactly in the same time (even in ms scale)? The button is on client side, so I must do something in the back end.
This is the unique key looks like:
19/XXXXXX-ABC/XYZ
The XXXXXX is auto increment number from 000001 to 999999. I have this code but didn't know if it's reliable enough to handle my issue.
$autoinc = $this->MPenomoran->get_surat($f_nomor)->jumlah_no+1; //count data in table and added 1
$no_1 = date('y')+2;
$no_2 = str_pad($autoinc, 6, '0', STR_PAD_LEFT);
$no_3 = "-ABC/XYZ";
$nomor = $no_1."/".$no_2.$no_3;
$returned_nomor = $nomor;
$success = array ('nomor' => $returned_nomor); //sent unique keys to user's view
It seems like you don't want to come out and tell us what the platform is for this, or what the limitations to that platform are.
The first thing that jumps out is that your format is limited by year, to 999999 total unique keys. Very odd, but presumably you understand that limit, and would need to put in some code to deal with hitting the maximum number.
Approaches
REDIS based
This would be very simple with a REDIS server using the INCR. Since INCR is atomic, you essentially have a solution just by creating a key named for your year + 2, should it not exist, and using INCR on it from there on out.
You would need to utilize some php redis client, and there are a variety of them with strengths and weaknesses to each that I'm not going to go into.
Redis is also great for caching, so if at all possible that is the first thing I would look into.
MySQL Based
There are a few different solutions using mysql. They are involved, so I'll just outline them because I don't want to spend time writing a novel.
Note: You will need to translate these into the appropriate PHP code (mysqli or PDO) where as noted, parameters are passed, transactions started etc.
MySQL - create your own sequence generator
Create a table named "Sequence" with this basic structure:
name varchar(2) PK
nextval unsigned int default 1
engine=InnoDB
The underlying query would be something like this:
BEGIN_TRANS;
SELECT nextval FROM Sequence WHERE name = '$no_1' FOR UPDATE;
UPDATE Sequence SET nextval = nextval + 1;
END_TRANS;
This code emulates a serialized Oracle style sequence. It is safe from a concurrency standpoint, because MySQL will lock the row briefly, then increment it upon completion.
MySQL - Autoincrement on multi-value PK
This comes with some caveats.
It is generally incompatible with replication.
The underlying table must be myisam
name varchar(2) PK
lastval unsigned int AUTO_INCREMENT PK
engine=MyISAM
Your underlying query would be:
INSERT INTO Sequence (name) VALUES ('$no_1')
This depends on mysql supporting a multi-column key where the 2nd column is an AUTO_INCREMENT. It's behavior is such that it acts like a sequence for each unique name.
You would then use the relevant api's built-in approach to getting the mysql LAST_INSERT_ID(). For example with PDO
Other alternatives
You could also use semaphores, files with locking, and all sorts of other ideas to create a sequence generator that would work well in a monolithic (one server for everything) environment. MySQL and Redis would serve a cluster, so those are more robust options from that standpoint.
The important thing is that whatever you do, you test it out using a load tester like siege or Boom to generate multiple requests at your web level.

Reseting mysql autoincrement field value after delete from php?

I have a form from which i am inserting data into mysql works fine.But when i delete some data from mysql, and inserted values into database again the autoincrement value is starting from the previous row value.
ForExample:
If i have 1,2,3,4,5 as id's in mydatabse and if i delete 4 and 5 id's from database
and started inserting next data from PHP. then the id's are coming from 6.... But i need to get id as 4 .can any one give suggestions.Thanks in advance.
I'm afraid MySQL does not allow you to "reset" AUTO_INCREMENT fields like that. If you need that behavior, you have to stop using AUTO_INCREMENT and generate your IDs manually.
Auto increment does not (and cannot) guarantee an unbroken sequence.
You can implement this yourself as "SELECT MAX(ID) + 1 FROM MYTABLE;"
But be warned: You will take a slight but noticeable performance hit.
If you are running updates concurrently you risk deadlocks
(again if you are running updates concurrently) you will risk having two inserts with the same key.
You can also implement this by running your own counter in a separate table. You must have program logic to decrement this correctly on a deletion, and, again you will get a performance hot and risk of deadlock as the "counter" will become an object of contention.
You should not play with AUTO_INCREMENT value in a production environment let MySQL take care of its value for you.
If you need to know how many row you have you can use
SELECT COUNT(id) FROM tbl;
Anyway if you really want to change its value the syntax is :
ALTER TABLE tbl AUTO_INCREMENT=101;

Setting manual increment value on synchronized mysql servers

I have a mysql\PHP application hosted on intranet and on internet. Both mysql servers are replicated i.e., synchronized on real time.
I have some tables which have auto increment id as primary key. When sync goes off, for new transactions same auto increment value is used on online as well as intranet server.
So even when servers get connected and sync starts; records with same auto increment id do not get sync. Ids with non overlapping values get synced soon the servers get connected.
To resolve this issue, I am thinking of using manual increment values with different range on intranet and online.
Please suggest, what could be the best solution for this problem.
Also if I have to go with manual increment ids, what would be the best technique OR algo to assign ids separately on online and on intranet.
I figured out the solution to this problem.
While configuring the replication of the mysql servers auto increment settings should be adjusted such the ids on the servers never overlap. Example if you have 2 servers replicated one server should only generate even auto increment ID's and other only odd ids.
Here's the link for detail information on this.
http://jonathonhill.net/2011-09-30/mysql-replication-that-hurts-less/
Updating the settings on both the servers resolved this issue.
There are a two things you can do. The first would be to change the starting value of the live server to a very high number (higher then the expected number of rows)
EG:
ALTER TABLE tbl AUTO_INCREMENT = 10000;
Now the numbers wont overlap. If that is not an option you can change the interval with
SET ##auto_increment_increment=10;
But this would also mean there is an overlap at one point. because the server with increment steps of 1 will catch up with the steps of 10 after.. you guessed.. 10 rows!
But you could bypass this by setting one server to start increment at 1 and the other at 2, and then make both have increment steps of 2.
That would make something like
intranet 1, 3, 5, 7, 9
live 2, 4, 6, 8, 10
You could also use a two column primary key to prevent duplication. Now you have an auto increment field in combination with a varchar field (live and intr) and that is your unique key.
CREATE TABLE `casetest`.`manualid` (
`id` INT( 10 ) NOT NULL AUTO_INCREMENT ,
`server` VARCHAR( 4 ) NOT NULL DEFAULT 'live',
`name` INT NOT NULL ,
PRIMARY KEY ( `id` , `server` )
) ENGINE = MYISAM ;

Avoiding auto-increment ID collisions when moving data between MySQL servers

So the situation is that I am going to have two or more "insert" machines where my web application just inserts data that we want to log into the machines (they are all behind a load balancer). Every couple hours, one by one the machines will be disconnected from the load balancer and upload their information into the "master" database machine should have a relatively up to date version of all the data we are collecting.
Originally I was going to use mysqldump, but found that you cannot specify the command to not grab the auto_increment id column I have (which would lead to collisions on primary key). I saw another post recommending using a temporary table to put the data in and then drop the column, but the "insert" machines have very low specs, and the amount of data could be pretty significant on the order of 50,000 rows. Other than just programatically just taking x rows at a time and inserting them into the remote "master" database, is there an easier way to do this? Currently I have php installed on the "insert" machines.
Thank you for your input.
Wouldn't you want the master database record to have the same primary key for each record as the slave database? If not, that could lead to problems where a query will produce different results based on which machine it's on.
If you want an arbitrary primary key that will avoid collisions, consider removing the auto-increment ID and constructing an ID that's guaranteed to be unique for every record on each server. For example, you could concatenate the unix time (with microseconds) with an identifier that's different for each server. A slightly lazier solution would be to concatenate time + a random 10-digit number or something. PHP's uniqid() function does something like this automatically.
If you don't intend to ever use the ID, then just remove it from your tables. There's no rule saying that every table has to have a primary key. If you don't use it, but you want to encode information about when each record was inserted, add a timestamp column instead (and don't make it a key).

Categories