until now i ve always stored records in mysql database by generating an ID (varchar 32 primary key) with php, with a function like that:
$id = substr( str_shuffle( abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 ), 0, 8 );
but until now in mysql DB i've always use utf8_bin (that is case sensitive) now i'm using utf8_general_ci (case insensitive).
I have a table in my DB to store statistics, in this table there are a millions of records.
in this case is better to use: 'id int unsigned autoincrement' as primary key?
if yes, is possibile that if many users call the sciprt at the same time the script crash with a 'duplicate id' error? and how i can avoid that?
Even though several people can access the site at once, but MySQL will process inserts in the table sequentially and will queue requests it receives. So in the insert query, if an ID is not provided an auto-incremented ID will be generated and then the row saved and committed. And the next request in queue will be processed. There is no way an auto-incremented ID can be as such duplicated.
Additionally, your code generates a random string and not an unique string. There is a lot of difference between the two. It is quite possible to generate a random string sequence that has been generated earlier.
On the other hand auto-increment is a gradually increasing sequential no ensuring there is no chance of having a duplicate key. As such it is always advised to use auto-increment to generate a primary key than generate one's own.
To get the last generated MySQL ID you can use mysqli_insert_id() right after your insert query in PHP and use it in your code for subsequent interactions with MySQL with respect to the inserted row.
At my opinion a autoincrement with mysql is better, because your php script now could be visited by more than one person at the same time.
So the id is maybe not unique anymore.
And I am pretty sure that mysql is so well programmed that it prohibit same ids ;)
In fact your current code has the bug that the same ID might be generated again. MySQL generated id doesn't have this problem. Even if you have a reason to generate your own ids, I would still use MySQL autoincrement integer to link between tables because of better indexing (speed).
And if for example you want to hide the sequence from the user, keep it in separate column with unique index. And do the id generation and insert in do while loop so if you happen to generate the same id second time, you can retry.
Related
I have got a table which has an id (primary key with auto increment), uid (key refering to users id for example) and something else which for my question won’t matter.
I want to make, lets call it, different auto-increment keys on id for each uid entry.
So, I will add an entry with uid 10, and the id field for this entry will have a 1 because there were no previous entries with a value of 10 in uid. I will add a new one with uid 4 and its id will be 3 because I there were already two entried with uid 4.
...Very obvious explanation, but I am trying to be as explainative an clear as I can to demonstrate the idea... clearly.
What SQL engine can provide such a functionality natively? (non Microsoft/Oracle based)
If there is none, how could I best replicate it? Triggers perhaps?
Does this functionality have a more suitable name?
In case you know about a non SQL database engine providing such a functioality, name it anyway, I am curious.
Thanks.
MySQL's MyISAM engine can do this. See their manual, in section Using AUTO_INCREMENT:
For MyISAM tables you can specify AUTO_INCREMENT on a secondary column in a multiple-column index. In this case, the generated value for the AUTO_INCREMENT column is calculated as MAX(auto_increment_column) + 1 WHERE prefix=given-prefix. This is useful when you want to put data into ordered groups.
The docs go on after that paragraph, showing an example.
The InnoDB engine in MySQL does not support this feature, which is unfortunate because it's better to use InnoDB in almost all cases.
You can't emulate this behavior using triggers (or any SQL statements limited to transaction scope) without locking tables on INSERT. Consider this sequence of actions:
Mario starts transaction and inserts a new row for user 4.
Bill starts transaction and inserts a new row for user 4.
Mario's session fires a trigger to computes MAX(id)+1 for user 4. You get 3.
Bill's session fires a trigger to compute MAX(id). I get 3.
Bill's session finishes his INSERT and commits.
Mario's session tries to finish his INSERT, but the row with (userid=4, id=3) now exists, so Mario gets a primary key conflict.
In general, you can't control the order of execution of these steps without some kind of synchronization.
The solutions to this are either:
Get an exclusive table lock. Before trying an INSERT, lock the table. This is necessary to prevent concurrent INSERTs from creating a race condition like in the example above. It's necessary to lock the whole table, since you're trying to restrict INSERT there's no specific row to lock (if you were trying to govern access to a given row with UPDATE, you could lock just the specific row). But locking the table causes access to the table to become serial, which limits your throughput.
Do it outside transaction scope. Generate the id number in a way that won't be hidden from two concurrent transactions. By the way, this is what AUTO_INCREMENT does. Two concurrent sessions will each get a unique id value, regardless of their order of execution or order of commit. But tracking the last generated id per userid requires access to the database, or a duplicate data store. For example, a memcached key per userid, which can be incremented atomically.
It's relatively easy to ensure that inserts get unique values. But it's hard to ensure they will get consecutive ordinal values. Also consider:
What happens if you INSERT in a transaction but then roll back? You've allocated id value 3 in that transaction, and then I allocated value 4, so if you roll back and I commit, now there's a gap.
What happens if an INSERT fails because of other constraints on the table (e.g. another column is NOT NULL)? You could get gaps this way too.
If you ever DELETE a row, do you need to renumber all the following rows for the same userid? What does that do to your memcached entries if you use that solution?
SQL Server should allow you to do this. If you can't implement this using a computed column (probably not - there are some restrictions), surely you can implement it in a trigger.
MySQL also would allow you to implement this via triggers.
In a comment you ask the question about efficiency. Unless you are dealing with extreme volumes, storing an 8 byte DATETIME isn't much of an overhead compared to using, for example, a 4 byte INT.
It also massively simplifies your data inserts, as well as being able to cope with records being deleted without creating 'holes' in your sequence.
If you DO need this, be careful with the field names. If you have uid and id in a table, I'd expect id to be unique in that table, and uid to refer to something else. Perhaps, instead, use the field names property_id and amendment_id.
In terms of implementation, there are generally two options.
1). A trigger
Implementations vary, but the logic remains the same. As you don't specify an RDBMS (other than NOT MS/Oracle) the general logic is simple...
Start a transaction (often this is Implicitly already started inside triggers)
Find the MAX(amendment_id) for the property_id being inserted
Update the newly inserted value with MAX(amendment_id) + 1
Commit the transaction
Things to be aware of are...
- multiple records being inserted at the same time
- records being inserted with amendment_id being already populated
- updates altering existing records
2). A Stored Procedure
If you use a stored procedure to control writes to the table, you gain a lot more control.
Implicitly, you know you're only dealing with one record.
You simply don't provide a parameter for DEFAULT fields.
You know what updates / deletes can and can't happen.
You can implement all the business logic you like without hidden triggers
I personally recommend the Stored Procedure route, but triggers do work.
It is important to get your data types right.
What you are describing is a multi-part key. So use a multi-part key. Don't try to encode everything into a magic integer, you will poison the rest of your code.
If a record is identified by (entity_id,version_number) then embrace that description and use it directly instead of mangling the meaning of your keys. You will have to write queries which constrain the version number but that's OK. Databases are good at this sort of thing.
version_number could be a timestamp, as a_horse_with_no_name suggests. This is quite a good idea. There is no meaningful performance disadvantage to using timestamps instead of plain integers. What you gain is meaning, which is more important.
You could maintain a "latest version" table which contains, for each entity_id, only the record with the most-recent version_number. This will be more work for you, so only do it if you really need the performance.
In my database (MySQL) I have a table (MyISAM) containing a field called number. Each value of this field is either 0 or a positive number. The non zero values must be unique. And the last thing is that the value of the field is being generated in my php code according to value of another field (called isNew) in this table. The code folows.
$maxNumber = $db->selectField('select max(number)+1 m from confirmed where isNew = ?', array($isNew), 'm');
$db->query('update confirmed set number = ? where dataid = ?', array($maxNumber, $id));
The first line of code select the maximum value of the number field and increments it. The second line updates the record by setting it freshly generated number.
This code is being used concurrently by hundreds of clients so I noticed that sometimes duplicates of the number field occur. As I understand this is happening when two clients read value of the number field almost simultaneously and this fact leads to the duplicate.
I have read about the SELECT ... FOR UPDATE statement but I'm not quite sure it is applicable in my case.
So the question is should I just append FOR UPDATE to my SELECT statement? Or create a stored procedure to do the job? Or maybe completely change the way the numbers are being generated?
This is definitely possible to do. MyISAM doesn't offer transaction locking so forget about stuff like FOR UPDATE. There's definitely room for a race condition between the two statements in your example code. The way you've implemented it, this one is like the talking burro. It's amazing it works at all, not that it works badly! :-)
I don't understand what you're doing with this SQL:
select max(number)+1 m from confirmed where isNew = ?
Are the values of number unique throughout the table, or only within sets where isNew has a certain value? Would it work if the values of number were unique throughout the table? That would be easier to create, debug, and maintain.
You need a multi-connection-safe way of getting a number.
You could try this SQL. It will do the setting of the max number in one statement.
UPDATE confirmed
SET number = (SELECT 1+ MAX(number) FROM confirmed WHERE isNew = ?)
WHERE dataid = ?
This will perform badly. Without a compound index on (isNew, number), and without both those columns declared NOT NULL it will perform very very badly.
If you can use numbers that are unique throughout the table I suggest you create for yourself a sequence setup, which will return a unique number each time you use it. You need to use a series of consecutive SQL statements to do that. Here's how it goes.
First, when you create your tables create yourself a table to use called sequence (or whatever name you like). This is a one-column table.
CREATE TABLE sequence (
sequence_id INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`sequence_id`)
) AUTO_INCREMENT = 990000
This will make the sequence table start issuing numbers at 990,000.
Second, when you need a unique number in your application, do the following things.
INSERT INTO sequence () VALUES ();
DELETE FROM sequence WHERE sequence_id < LAST_INSERT_ID();
UPDATE confirmed
SET number = LAST_INSERT_ID()
WHERE dataid = ?
What's going on here? The MySQL function LAST_INSERT_ID() returns the value of the most recent autoincrement-generated ID number. Because you inserted a row into that sequence table, it gives you back that generated ID number. The DELETE FROM command keeps that table from snarfing up disk space; we don't care about old ID numbers.
LAST_INSERT_ID() is connection-safe. If software on different connections to your database uses it, they all get their own values.
If you need to know the last inserted ID number, you can issue this SQL:
SELECT LAST_INSERT_ID() AS sequence_id
and you'll get it returned.
If you were using Oracle or PostgreSQL, instead of MySQL, you'd find they provide SEQUENCE objects that basically do this.
Here's the answer to another similar question.
Fastest way to generate 11,000,000 unique ids
I have a form from which i am inserting data into mysql works fine.But when i delete some data from mysql, and inserted values into database again the autoincrement value is starting from the previous row value.
ForExample:
If i have 1,2,3,4,5 as id's in mydatabse and if i delete 4 and 5 id's from database
and started inserting next data from PHP. then the id's are coming from 6.... But i need to get id as 4 .can any one give suggestions.Thanks in advance.
I'm afraid MySQL does not allow you to "reset" AUTO_INCREMENT fields like that. If you need that behavior, you have to stop using AUTO_INCREMENT and generate your IDs manually.
Auto increment does not (and cannot) guarantee an unbroken sequence.
You can implement this yourself as "SELECT MAX(ID) + 1 FROM MYTABLE;"
But be warned: You will take a slight but noticeable performance hit.
If you are running updates concurrently you risk deadlocks
(again if you are running updates concurrently) you will risk having two inserts with the same key.
You can also implement this by running your own counter in a separate table. You must have program logic to decrement this correctly on a deletion, and, again you will get a performance hot and risk of deadlock as the "counter" will become an object of contention.
You should not play with AUTO_INCREMENT value in a production environment let MySQL take care of its value for you.
If you need to know how many row you have you can use
SELECT COUNT(id) FROM tbl;
Anyway if you really want to change its value the syntax is :
ALTER TABLE tbl AUTO_INCREMENT=101;
I have a CSV in the format:
Bill,Smith,123 Main Street,Smalltown,NY,5551234567
Jane,Smith,123 Main Street,Smalltown,NY,5551234567
John,Doe,85 Main Street,Smalltown,NY,5558901234
John,Doe,100 Foo Street,Bigtown,CA,5556789012
In other words, no one field is unique. Two people can have the same name, two people can have the same phone, etc., but each line is itself unique when you consider all of the fields.
I need to generate a unique ID for each row but it cannot be random. And I need to be able to take a line of the CSV at some time in the future and figure out what the unique ID was for that person without having to query a database.
What would be the fastest way of doing this in PHP? I need to do this for millions of rows, so md5()'ing the whole string for each row isn't really practical. Is there a better function I should use?
If you need to be able to later reconstruct the ID from only the text of the line, you will need a hash algorithm. It doesn't have to be MD5, though.
"Millions of IDs" isn't really a problem for modern CPUs (or, especially, GPUs. See Jeff's recent blog about Speed Hashing), so you might want to do the hashing in a different language than PHP. The only problem I can see is collisions. You need to be certain that your generated hashes really are unique, the chance of which depends on the number of entries, the used algorithm and the length of the hash.
According to Jeff's article, MD5 already is only of the fastest hash algorithms out there (with 10-20,000 million hashes per second), but NTLM appears to be twice as fast.
Why not just
CREATE TABLE data (
first VARCHAR(50),
last VARCHAR(50),
addr VARCHAR(50),
city VARCHAR(50),
state VARCHAR(50),
phone VARCHAR(50),
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
PRIMARY KEY (id)
);
LOAD DATA [LOCAL] INFILE 'file.csv'
INTO TABLE data
(first,last,addr,city,state,phone);
How about just add the unique ID as a field?
$csv=file($file);
$i=0;
$csv_new=array();
foreach ($file as $val){
$csv_new[]=$i.",".$val;
$i++;
}
And output the $csv_new as the new csv file..
Dirty but it may work for you.
I understand what you're saying but I do not see a point. Creating a unique id that auto increments in the database would be the best route. The second route would be creating in the csv something like cell=a1+1 and dragging it down the entire row. In php you ca. Read the file and prepend something such as date(ymd).$id then write it back to the file. Again though this seems silly to do and the database route would be best. Just keep in mind pci compliance and always encrypt the data.
I'll post code later. I'm not at the PC at this time.
It's been a long time, But I found a situation that is sort of like this where I needed to prevent a row being created in a database, I created another column called de_dup which was set to be unique. I then for each row on creation used date('ymd').md5(implode($selected_csv_values)); this would prevent a customer from creating to orders on any given day unless specific information was different ie: firstname,lastname,creditcardnum,billingaddress.
I want to build a database-wide unique id. That unique id should be one field of every row in every table of that database.
There are a few approaches I have considered:
Create one master-table with an auto-increment-field and a trigger in every other table, like:
"before insert here, insert in master-table -> get the auto-increment value -> and use this value as primary-key here"
I have seen this before, but instead of making one INSERT, it does 2 INSERTS, which I expect would not be that performant.
Add a field uniqueId to every table, and fill this field with a PHP-generated integer... something like unix-timestamp plus a random number.
But I had to use BIGINT as the datatype, which means big index_length and big data_length.
Similar to the "uniqueId" idea, but instad of BIGINT I use VARCHAR and use uniqid() to populate this value.
Since you are looking for opinions... Of the three ideas you give, I would "vote" for the uniqid() solution. It seems pretty low cost in terms of execution (but possibly not implementation).
A simpler solution (I think) would be to just add a field to each table to store a guid and set the default value of the field to be MySQL's function that generates a guid (I think it is UUID). This lets the database do the work for you.
And in the spirit of coming up with random ideas... It would be possible to have some kind of offline process fill in the IDs asynchronously. Make sure every table has the appropriate field and make the default value be 0/empty. Then the offline process could simply run a query on each table to find the rows that do not yet have a unique id and it could fill them in. That would let you control the ID and even use some kind of incrementing integer. This, of course, requires that you do not need the unique ID instantly each time a record is inserted.