Creating a custom md5 user id from form information - php

I am creating a form that will capture all of the form data and store it in a database, what I would like to do is use some of the form data to create a custom md5 user id to prevent multiple entries, I know that this is most probably not the most ideal way of doing it, so if there is a better way of creating a unique md5 uid that I can use for each user, please enlighten me.
I have considered just using a random number and checking the database against the email and first name, but I am curious to see if there is a better way of doing it.
Thanx in advance!

Wait ... you are wanting to use a unique MD5 to create a user id? ... why not use an auto_increment integer field? Each time the INSERT is run, it will be increased by 1 therefore always being unique. And since it is an integer, if you are doing any searches against it it would be a lot faster.

You can let MySQL do the work for you using "UNIQUE". Assuming you have a table user_data(user_data_id, name, text, content, date) and want name and text to be UNIQUE as a tuple:
CREATE user_data (
user_data_id INTEGER,
name VARCHAR(50),
text TEXT,
date DATE,
PRIMARY_KEY(user_data_id),
UNIQUE(name,text)
);

I assume you mean you want to avoid having duplicate entries by the same person.
You should probably check the database for the input's email + firstname + lastname, after normalizing them with strtolower() and removing spaces etc.
Apart from that you can't know for sure if the person entering data in the form has done it before. You can't safely rely on the ip being the same due to proxies and gateways, or even the computer being the same (shared computers). If you are aggressive on those 2 fronts you'll probably get alot of frustrated legitimate users that can't use your system.
Your best bet is to assume your database has duplicate entries and then decide what to do with them. Either flag the ones not being used, if they're accounts on the system have some intelligence to check if they're duplicates based on their behavior.

Related

best type in mysql for language field?

I am beginner in php and mysql,and I want to develop a multilanguage site. In my database I have users table that stores users info like id,password,access_type,language,...
I would to use char(2) type, but now I think it can't store all languages. What type i should use for language field?
I write my other field types here and I will be happy to know your opinions about them!
For id that is user email I use varchar(255).
For password I use varchar(255) and save hashed password that password_hash method return.
For access_type I use char(2). I use su for super user and au for amateur users.
For first name I use varchar(32)
for last name I use varchar(32)
sorry for my english because my language isn't english!!
you can use anything between 2 to 4 as per you requirement.better
create a foreign key instance in some other table where you are having
detailed information about the languages. for foreign key concept you
can visit
here:http://dev.mysql.com/doc/refman/5.6/en/create-table-foreign-keys.html
enum can also be handy.
I would use an enum if I wanted to make sure that only the values that are acceptable are the only choices. The other change that I would make is the id column. I would change that to an auto-increment id and set the email field as an email. You can still authenticate a user by that but when you need to run operations like edit, delete, etc... you can use the user's "id" which would be faster than looking up against the email all the time. Performance is going to depend on how many records are in the database as well. If we're talking a few thousand records, I wouldn't worry about it too much, but if it's a large amount of records, you may need to optimize more.

CSV and ID problems

I have a database with employees in it.
Since my employer finds it easy to input the data in a CSV file, I wrote a program that truncates my database and inserts the CSV data in my DB.
Employee: [ID, LAST_NAME, NAME, EMAIL, REMARKS, ...]
I use the field ID, (which is an auto_increment value) to make all my employee's unique. This works fine, however recently my employer has asked me too to include a functionality to mark favorites.
The only thing which makes my employees unique is the ID key thus when I update
the new CSV file the ID's go all broke and are shifted since I had to truncate my database and the favorites don't match up any more.
An example of what I mean (CSV file):
0, Carlton, John, john#gmail.com, "Great worker",
1, Awsome, Dude, awsomeDud#aol.com, "Not so great",
2, Random, Randy, rr#hotmail.com, "idk"
Suppose somebody deletes the record with ID 1.
And my favorite was 1, the csv file however will now look like this:
0, Carlton, John, john#gmail.com, "Great worker",
1, Random, Randy, rr#hotmail.com, "idk"
It points to the wrong person.
Keep in mind that the ID's I wrote are not part of the csv file itself
they are the auto_increment value.
I have given this problem a lot of thought and I cannot seem to find a simple way to accomplish this.
Any help would be appreciated.
Notes:
Emails are not unique, nor required.
The only real unique field is the ID field.
Solution 1 (easiest)
Have an int is_favorite column in your database containing 1 or 0, with a default value of 0 (meaning is not a favorite). Then ask your client to slightly change the format of the csv file as follows:
Employee: [ID, LAST_NAME, NAME, EMAIL, REMARKS, FAVORITE, ...]
Example CSV:
0, Carlton, John, john#gmail.com, "Great worker", 1
1, Awsome, Dude, awsomeDud#aol.com, "Not so great", 0
2, Random, Randy, rr#hotmail.com, "idk"
When you process the CSV file, depending on the FAVORITE column just set the same value in the database. This will eliminate the problem with the mismatched favorites. Unfortunately, if in the near feature, the client requires new features which depend on the favorites, you might have the same issue again.
Solution 2 (best)
Discuss a more mature solution with your client pointing out the current CSV solution is no longer a valid option due to the issue with matching the CSV users with the appropriate sub features (i.e. favorites)
A possible solution would be to never truncate your table. Ever.
Find out what makes the employees unique. E.g. EMAIL.
Then when you parse the next CSV's, you don't simply INSERT the employees. You update the current ones and insert the new ones.
This way, your IDs always stay the same (which they should).
I would have used something like this:
IF EXISTS (SELECT 1 FROM [User] WHERE [Email] = #UsersEmail)
BEGIN
UPDATE [User]
SET [Name] = #NewName
WHERE [Email] = #UsersEmail
END
ELSE
BEGIN
INSERT INTO [User] ([Email], [Name]) VALUES
(#UsersEmail, #NewName)
END
But since you've tagged it PHP, I'm guessing you're using MySQL. Which can do it differently (from here):
INSERT INTO subs
(subs_name, subs_email, subs_birthday)
VALUES
(?, ?, ?)
ON DUPLICATE KEY UPDATE
subs_name = VALUES(subs_name),
subs_birthday = VALUES(subs_birthday)
I would not truncate the table. I would then upload the csv into a temporary table. If the same ID is in both tables, do an update. If it is only in the old version, delete it (deleting out favorites as well for that ID), or, perhaps better, have a flag on the employees table that deactivates the row. If it is only in the new version, insert everything except the ID (which will probably be an empty string anyway). Then you can delete the temporary table.
If you want to be paranoid, you can double check names or emails and if you find a mismatch, flag them without updating. That would cause a manual operation if someone changed their name, but it would also save you the trouble if someone messed up your id numbers.
The simple and clean way to solve this would be to find a way to recognise unique employees on the flat data.
Is there no other unique identifier that could be added to the csv file? For example, a windows login name? A company employee No? Something that would be static.
That way it's simple:
1, Don't truncate.
2, If Windows LoginID / EmpNo exists, update.
3, If not, add.
Also I'm concerned that your "favourites" table is clearly not using referential integrity. It should have a FK pointing to your Employee.ID; preventing you accidentally deleting an employee that was marked as a favourite, amongst other things.
A messier, much less bullet proof way, would be to mark your favourites based on your employee names rather than IDs. There are obvious draw backs to this approach, so use as last resort.
You should never use the ID to identify a given user for the reasons you described in the question.
You could create a new reference ID field based on what you already have and create a unique identifier by chaining the required fields as a single string and then calculating the MD5 hash for example.
I have a question (sorry but I can't comment - rep): your employer adds only new employees via CSV file or even edit existing ones?
If only new employees are added you don't need to rebuild the table from scratch and you can make sure that your program generates a unique reference ID (that will remain unchanged) before inserting data into the db. Also your program can handle the editing of the employee, instead of changing data from CSV, leaving reference ID untouched.
This way all the fields like name, email, etc. can be edited and the link to favourites will stay correct. In that case the reference ID can also be calculated using not only data on the CSV but other like creation timestamp.
You could create MD5 hash from name , email and the comment , save and use that as unique identifier .
Make sure you store MD5 hash as binary
Can you modify database? If you can, add another field that you can call favourite. Set it to simple enum (1,0) and set 1 for favourites, 0 for others. So when, you truncate database, you'll still have your favourites by those fields. Of course if you have multi-level favourites, don't set field to enum, set it to something else, more suitable for you.
One solution is that the database becomes the defacto 'source' for IDs.
After the initial import, the next time your boss wants to update the file, create a CSV FROM the database (with the ID's intact) and ask your boss to update that and return it.
You could ask him to add new rows to the bottom of the file and leave out the ID.
Any row in the new spreadsheet without an ID is a new record. An extra field at the end of the row could be used by the boss to indicate rows to be deleted.
Repeat this process the next time the boss wants to update the file.
Add an extra field to your database table as well as to the CSV file named something like "EmployeeID" which should be unique for all employees.

Should i obscure database primary keys (id's) in application front end

I'm working on an application which allows a moderator to edit information of user.
So, at the moment, i have URL's like
http://xxx.xxx/user/1/edit
http://xxx.xxx/user/2/edit
I'm a bit worried here, as i'm directly exposing the users table primary key (id) from database.
I simply take the id from the URL's (eg: 1 and 2 from above URL's), query the database with the ID and get user information (of course, i sanitize the input i.e ID from URL).
Please note that:
I'm validating every request to check if moderator has access to edit that user
This is what i'm doing. Is this safe? If not, how should i be doing it?
I can think of one alternative i.e. have a separate column for users table with 25 character key and use the keys in URL's and query database with those keys
But,
What difference does it make? (Since key is exposed now)
Querying by primary key yields result faster than other columns
This is safe (and seems to be the best way to do it) as long as the validation of the admin rights is correct and you have prevention for SQL injection. Both of which you mention so I'd say you're good.
The basic question is if exposing primary key is safe or not. I would say that in most situations it is safe and I believe that Stackoverflow is doing it in the same way:
http://stackoverflow.com/users/1/
http://stackoverflow.com/users/2/
http://stackoverflow.com/users/3/
If you check the member for you can see that the time is decreasing, so the number is probably PK as well.
Anyway, obscuring PK can be useful in situation where you want a common user to avoid going through all entries just by typing 1, 2, 3 etc. to URL, in that case obscuring PK for something like 535672571d2b4 is useful.
If you are really unsure, you could also use XOR with a nice(big) fixed value. This way you would not expose your ids. When applying the same "secret number" again with the xor'ed field, you get the original value.
$YOUR_ID xor $THE_SECRET_NUMBER = $OUTPUTTED_VALUE
$PUTPUTTED_VALUE xor $THE_SECRET_NUMBER = $YOUR_ID
Fast answer no
Long answer
You have a primary key to identify some one with, which is unique. If you add an unique key to prevent people from knowing it, you get that they know an other key.
Which still needs to be unique and have an index (for fast search), sound a lot like a primary key.
If it is a matter of nice url's well then you could use an username or something like that.
But it would be security to obscurity. So beter prevent SQL injection and validate that people have access to the right actions
If you have plain autoincrement ids you will expose your data to the world. It is not sequre (e.g. for bruteforcing all available data in your tables). But you can generate ids of your DB entities not sequentially, but in pseudo random manner. E.g. in PostgreSQL:
CREATE TABLE t1 (
id bigint NOT NULL DEFAULT (((nextval('id_seq'::regclass) * 678223072849::bigint)
% (1000000000)::bigint) + 460999999999::bigint),
...
<other fileds here>
)

Generating unique non-random id based on CSV row?

I have a CSV in the format:
Bill,Smith,123 Main Street,Smalltown,NY,5551234567
Jane,Smith,123 Main Street,Smalltown,NY,5551234567
John,Doe,85 Main Street,Smalltown,NY,5558901234
John,Doe,100 Foo Street,Bigtown,CA,5556789012
In other words, no one field is unique. Two people can have the same name, two people can have the same phone, etc., but each line is itself unique when you consider all of the fields.
I need to generate a unique ID for each row but it cannot be random. And I need to be able to take a line of the CSV at some time in the future and figure out what the unique ID was for that person without having to query a database.
What would be the fastest way of doing this in PHP? I need to do this for millions of rows, so md5()'ing the whole string for each row isn't really practical. Is there a better function I should use?
If you need to be able to later reconstruct the ID from only the text of the line, you will need a hash algorithm. It doesn't have to be MD5, though.
"Millions of IDs" isn't really a problem for modern CPUs (or, especially, GPUs. See Jeff's recent blog about Speed Hashing), so you might want to do the hashing in a different language than PHP. The only problem I can see is collisions. You need to be certain that your generated hashes really are unique, the chance of which depends on the number of entries, the used algorithm and the length of the hash.
According to Jeff's article, MD5 already is only of the fastest hash algorithms out there (with 10-20,000 million hashes per second), but NTLM appears to be twice as fast.
Why not just
CREATE TABLE data (
first VARCHAR(50),
last VARCHAR(50),
addr VARCHAR(50),
city VARCHAR(50),
state VARCHAR(50),
phone VARCHAR(50),
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
PRIMARY KEY (id)
);
LOAD DATA [LOCAL] INFILE 'file.csv'
INTO TABLE data
(first,last,addr,city,state,phone);
How about just add the unique ID as a field?
$csv=file($file);
$i=0;
$csv_new=array();
foreach ($file as $val){
$csv_new[]=$i.",".$val;
$i++;
}
And output the $csv_new as the new csv file..
Dirty but it may work for you.
I understand what you're saying but I do not see a point. Creating a unique id that auto increments in the database would be the best route. The second route would be creating in the csv something like cell=a1+1 and dragging it down the entire row. In php you ca. Read the file and prepend something such as date(ymd).$id then write it back to the file. Again though this seems silly to do and the database route would be best. Just keep in mind pci compliance and always encrypt the data.
I'll post code later. I'm not at the PC at this time.
It's been a long time, But I found a situation that is sort of like this where I needed to prevent a row being created in a database, I created another column called de_dup which was set to be unique. I then for each row on creation used date('ymd').md5(implode($selected_csv_values)); this would prevent a customer from creating to orders on any given day unless specific information was different ie: firstname,lastname,creditcardnum,billingaddress.

Apply UUID function to database field from PHP

I have this CodeIgniter and MySQL application.
I have a database table where one field must be unique. When I "delete" a record from the table I don't want to really remove it from the table, I just want to set free the value of the unique field so it can be used for a future new record without conflicts and leave it there.
At first I tought applying some sort of UUID function to the field would be a good solution.
Can somebody please point me how can I apply the UUID function to the field from the PHP code?
I googled about it and couldn't come up with nothing, CodeIgniter's docs neither.
Some other toughts are also welcome and appreciated.
Thanks in advanced!
If I understand correctly your aim here, you can do this with a single line of sql statement.
update users set username = CONCAT(UUID(), username) where username = "username_to_be_deleted"
This is quite a good attempt to keep the unique constraint, unless some wicked handed user of yours picked a username that is in the format of a unique id + some string, and it will accidentaly match. Not likely, though.
Added benefit: as UUID has a fixed format, you can always extract the original username from the encoded value.
And of course, a much better aproach, if you do not add a unique constraint on a field like this, but rather enforce uniqueness programmatically.

Categories