Dealing with large amount of data in php/mysql - php

So i have something like a auction and for each deal or that auction i have to generate random identifier code and assign to user.
So i came up with something like this for db storage {1:XKF3325A|ADSTD2351;7:ZARASR23;12:3290OASJX} - so what i have there is user id : random code and some user can have several random codes seperated by |.
My question is what type of storing i should use for my db? The codes i generate for users might be over 2k-3k.

Perhaps you're asking the wrong question. You simply shouldn't be dealing with identifiers of this size. The answer is to deal with smaller identifiers by using a better identifier generation algorithm. Your table indexes will thank you.
So the question you should be asking is: How do I create short but unique identifiers? The answer is as follows:
You should use a cryptographically secure hash algorithm (eg. SHA-1) to generate your identifier, or just use a UUID implementation to do this. PHP has a UUID implementation called uniqid (might have to be compiled in), so there's really no need to roll your own. Both methods give you an ID that is way shorter than what you're using, and both can "guarantee" uniqueness across a huge sample size (and more effectively than your algorithm). And when I say shorter, I'm talking anywhere between 16-64 bytes at most (SHA-1 generates 40byte hashes).
If you go the SHA-1 hash route, the methodology would be to hash some random (but unique-to-the-user input), like sha1(timestamp+username+itemname+randomseed). You can also make use of uniqid here (see the comments in the PHP documentation for the function) and do: sha1(uniqid()). Make sure to read the notes about uniqid() to see the caveats about generating many ids per second.

So what if they are over 3k? MySQL can deal with huge stuff, don't worry. Just use two tables, like:
Users
- id
- other info..
Identifiers
- random_str
- foreign key to user
Then you can fetch identifiers for an id, fetch the user that an identifier belongs to, etc

First - see Loren Segal's answer for some good ideas.
Second - why would you use anything random to ensure uniqueness? Random values don't guarantee uniquness -- even if they can make it very unlikely that you'll have a collision.
In most every case, relational databases can solve your problem without any tricks. Indexes and composite keys are designed to solve this kind of problem efficiently.
If you need a single value to uniquely identify something, look into the uniquid stuff that Loren mentioned.

Related

Efficiently comparing hashes in a MySQL data set

I have an encrypted database (MySQL) which has certain columns that need to be searched by a (non-public, authorised) user.
I have been reading this article on Database SE about searching encrypted fields in a database.
I have come to a conclusion from reading this, that the best way to do this would be with a hash of the column values to compare with the given data, so:
Example:
Searching for a phone number.
HTML: User enters phone number to search for
PHP: phone number is hashed
MySQL: searches hashed_phone column(s) to compare (=) and returns
matching results.
PHP: matching result rows are decrypted and then output to user.
Given the small caveat that I can't search part of a phone number (which would be ideal in concept but well outside the scope of this specific question), I find an issue that:
Using secured hashing algorithms (password_hash, etc) all use a salt as a best practise, and this entirely makes sense BUT gives me an issue in that if I create the hash of the search term plus the salt, this makes the hash different from the stored database value, I can't then use a comparison operator to find the correct row(s) from the Dataset.
In Summary:
How can I solve this issue and have encrypted data that can be subject to some degree of searching without needing to decrypt each row of the dataset individually?
Can I use a secure hashing algorithm that does not use a salt (as opposite as that sounds), without leaving a [potential] gaping hole in the security risk of the data at rest (at least, for those columns)?
Is there an alternative methodology I've not thought of?
I would like the MySQL side of things to be as efficient as possible, there are thousands of rows in the dataset so going through each one to then decrpyt and check is deeply inefficient.
How can I do this?

PHP MySQL table search by string - use hashing?

Using PHP, I have a MySQL database with an Actions table, in which a user optionally assigns actions to some pages in their website. Each such assignment results in an action row, containing (among other things) a unique ActionId and the URL of the appropriate page.
Later on, when in a context of a specific page, I want to find out if there is an action assigned to that page, and fetch (SELECT) the appropriate action row. At that time I know the URL of my page, so I can search the Actions table, by this relatively long string. I suspect this is not an optimal way to search in a database.
I assume a better way would be to use some kind of hashing which converts my long URL strings into integers, making sure no two different URLs are converted into the same integer (encryption is not the issue here). Is there such a PHP function? Alternatively, is there a better strategy for this?
Note I have seen this: SQL performance searching for long strings - but it doesn't really seem to come up with a firm solution, apart from mentioning md5 (which hashes into a string, not to integer).
The hashing strategy is a good strategy.
Dealing with the URL strings might indeed be a problem, because they can be very long, and contain a lot of special chars, which are always problematic for MySQL search (REGEXP or LIKE).
That is why hashing solves the problem. Even md5 which is not a good hashing function to hash passwords (because it's not secure anymore), is good to hash URL.
This way you will have http://www.stackoverflow.com changed into 4c9cbeb4f23fe03e0c2222f8c4d8c065, and that will be pretty much unique (unless you are very very unlucky).
Once you have your md5_url field set up, you can search with :
SELECT * FROM Actions where md5_url=?
Where the ? is an md5($url) of current URL.
Of course be sure to set an index on your md5_url field :
ALTER TABLE Actions
ADD md5_url varchar(32),
ADD KEY(md5_url);
If you add an index to the column, the database should take care of efficiency for you, and the length of the URL should make no difference.

Looking for suggestions on how to handle public facing object ids

I need help on coming up with a strategy to handle object ids in a PHP/MySQL application I'm working on. Basically, instead of having a URL look like this:
/post/get/1
I'm looking for something like:
/post/get/92Dga93jh
I know that security-through-obscurity is useless (I have an ACL system in place to handle security) but I still need to obscure the ids. This is where I'm stuck.
I thought about generating a separate public id for each DB row but have been unable to find a way to create truly unique ids.
I suppose I could encrypt and decrypt a MySQL auto increment row id as it leaves and enters my app, but I'm not sure how 'expensive' PHP's encryption and decryption methods are. Additionally, I need to make sure that the obscured id remains unique so that it doesn't decrypt into the wrong value.
Also, since my domain objects are related to each other, I want to avoid any unnecessary strain on MySQL if I decide to go with generating and storing an obscure id in the tables.
I'm beating my head against the wall because I feel like this is a common scenario, yet can't figure out what to do. Any help is greatly appreciated!
I'd just use a salted md5. It's secure for 99% of the cases. The other 1% will be when you are wacking your head on the wall cause you got your data stolen by a pro-hacker and it becomes critical to minimize the impact of it.
So:
$sql = 'SELECT * FROM my_table WHERE MD5(CONCAT(ID, "mysupersalt")) = "'.$my_checked_url_value.'"';
And generating the same thing from PHP can be done using similar strategy:
link text
Hope this is what you're looking for..
As long as you given 9-char base62 string - you could follow this strategy:
Generate a number from 1 to 13537086546263552 (62 ^ 9)
Convert it to the base62 string
Try to insert to the database (you're supposed to have the unique index over id field)
If ok - do nothing
If not ok - repeat 1-3
Use a one-way hash like md5, etc.
Depends on the application really, if its super essential that you have IDs from which the user can never 'guess' the original IDs, then use a recursive call to db to generate a unique public ID.
If on the other hand, you just need the IDs to look different without any security worries if someone can 'guess' the original ID, and are concerned with the performance, you can come up with a quick and basic math equation to generate a unique id on the fly and decode it as well when the URL is accessed.
(I know its a HACK, but gets the job done for a lot of cases)
E.g. If I access /blog/id/x!1#23409235 (which means /blog/id/1)
In the code, I can decode above by:
$blogId = intval(substr($_GET['id'], 4)) - 23409234;
and of course, while generating the URL, you add 23409234 to the original URL's id and prefix it with some random char bits..
Oh and you can use Apache's mod_rewrite to do all these calculations.
The probably easiest way is checking whether there is already such a record in a
do {
$id = generateID();
}
while(idExists($id));
loop. There shouldn't be to many duplicate IDs so in most cases there are only two queries most the time: Checking and Inserting.

Easy Encryption and Decryption with PHP

My PHP Application uses URLs like these:
http://domain.com/userid/120
http://domain.com/userid/121
The keys and the end of the URL are basically the primary key of the MySQL database table.
I don't want this increasing number to be public and I also don't want that someone will be able to crawl the user profiles just by interating the Id.
So I want to encrypt this Id for display in a way I can easily decrypt it again. The string shouldn't get much longer.
What's the best encryption method for this?
Simple Obscuring: Base64 encode them using base64_encode.
Now, your http://domain.com/userid/121 becomes: http://domain.com/userid/MTIx
Want more, do it again, add some letters around it.
Tough Obscuring: Use any encryption method using MCrypt library.
A better approach (from a usability and SEO perspective) would be to use a unique phrase rather than an obscured ID. In this instance the user's user name would seem an ideal solution, and would also be un-guessable.
That said, if you don't want to use this approach you could just use a hash (perhaps md5) of the user's user name which you'd store in the database along with their other details. As such, you can just do a direct lookup on that field. (i.e.: Having encrypt and decrypt part of the URL is probably overkill.)
You have a variety of choices here:
Generate and store an identifier in the database. It's good because you can then have readable keys that are guaranteed to be unique. It's bad because it causes a database schema change, and you have to actually query that table every time you want to generate a link.
Run an actual key-based encryption, for instance based on PHP's MCrypt. You have access to powerful cryptographic algorithms, but most secure algorithms tend to output strings that are much longer than what you expect. XOR does what you want, but it does not prevent accessing sequential values (and the key is pretty simple to determine, given the a priori knowledge about the numbers).
Run a hash-based verification: instead of using 121 as your identifier, use 121-a34df6 where a34df6 are the first six characters of the md5 (or other HMAC) of 121 and a secret key. Instead of decoding, you extract the 121 and recompute the six characters, to see if they match what the user sent. This does not hide the 121 (it's still right there before the hyphen) but without knowing the secret key, the visitor will not be able to generate the six characters to actually view the document numbered 121.
Use XOR with shuffling: shuffle the bits in the 30-bit identifier, then apply the XOR. This makes the XOR harder to identify because the shuffle pattern is also hidden.
Use XOR with on-demand keys: use fb37cde4-37b3 as your key, where the first part is the XOR of 121 and md5('37b3'.SECRET) (or another way of generating an XOR key based on 37b3 and a secret).
Don't use base64, it's easy to reverse engineer: if MTIx is 121, then MTIy is 122 ...
Ultimately, you will have to accept that your solution will not be secure: not only is it possible for users to leak valid urls (through their browser history, HTTP referer, or posting them on Twitter), but your requirement that the identifier fits in a small number of characters means a brute-force attack is possible (and becomes easier as you start having more documents).
Simplest but powerful encryption method: XOR with a secret Key. http://en.wikipedia.org/wiki/XOR_cipher
No practical performance degradation.
Base64 representation is not an encryption! It's another way to say the same.
Hope this helps.
Obscuring the URL will never secure it. It makes it harder to read, but not much harder to manipulate. You could use a hexadecimal number representation or something like that to obscure it. Those who can read hex can change your URL in a few seconds, anyway:
$hexId = dechex($id); // to hex
$id = hexdec($hexId); // from hex
I'd probably say it's better indeed to just create a random string for each user and store that in your database than to get one using hash. If you use a common hash, it's still very easy to iterate over all pages ;-)
I would write this in comments, but don't have the rep for it (yet?).
When user click on a link you should not use primary key, You can use the pkey in a session and get it from that session. Please do not use query string....
generate an unique string for each user and use it in your urls
http://domain.com/user/ofisdoifsdlfkjsdlfkj instead of http://domain.com/userid/121
you can use base64_encode and base64_decode function for encrypt and decrypt your URLS

Unique key generation

I looking for a way, specifically in PHP that I will be guaranteed to always get a unique key.
I have done the following:
strtolower(substr(crypt(time()), 0, 7));
But I have found that once in a while I end up with a duplicate key (rarely, but often enough).
I have also thought of doing:
strtolower(substr(crypt(uniqid(rand(), true)), 0, 7));
But according to the PHP website, uniqid() could, if uniqid() is called twice in the same microsecond, it could generate the same key. I'm thinking that the addition of rand() that it rarely would, but still possible.
After the lines mentioned above I am also remove characters such as L and O so it's less confusing for the user. This maybe part of the cause for the duplicates, but still necessary.
One option I have a thought of is creating a website that will generate the key, storing it in a database, ensuring it's completely unique.
Any other thoughts? Are there any websites out there that already do this that have some kind of API or just return the key. I found http://userident.com but I'm not sure if the keys will be completely unique.
This needs to run in the background without any user input.
There are only 3 ways to generate unique values, rather they be passwords, user IDs, etc.:
Use an effective GUID generator - these are long and cannot be shrunk. If you only use part you FAIL.
At least part of the number is sequentially generated off of a single sequence. You can add fluff or encoding to make it look less sequential. Advantage is they start short - disadvantage is they require a single source. The work around for the single source limitation is to have numbered sources, so you include the [source #] + [seq #] and then each source can generate its own sequence.
Generate them via some other means and then check them against the single history of previously generated values.
Any other method is not guaranteed. Keep in mind, fundamentally you are generating a binary number (it is a computer), but then you can encode it in Hexadecimal, Decimal, Base64, or a word list. Pick an encoding that fits your usage. Usually for user entered data you want some variation of Base32 (which you hinted at).
Note about GUIDS: They gain their strength of uniqueness from their length and the method used to generate them. Anything less than 128-bits is not secure. Beyond random number generation there are characteristics that go into a GUID to make it more unique. Keep in mind they are only practically unique, not completely unique. It is possible, although practically impossible to have a duplicate.
Updated Note about GUIDS: Since writing this I learned that many GUID generators use a cryptographically secure random number generator (difficult or impossible to predict the next number generated, and a not likely to repeat). There are actually 5 different UUID algorithms. Algorithm 4 is what Microsoft currently uses for the Windows GUID generation API. A GUID is Microsoft's implementation of the UUID standard.
Update: If you want 7 to 16 characters then you need to use either method 2 or 3.
Bottom line: Frankly there is no such thing as completely unique. Even if you went with a sequential generator you would eventually run out of storage using all the atoms in the universe, thus looping back on yourself and repeating. Your only hope would be the heat death of the universe before reaching that point.
Even the best random number generator has a possibility of repeating equal to the total size of the random number you are generating. Take a quarter for example. It is a completely random bit generator, and its odds of repeating are 1 in 2.
So it all comes down to your threshold of uniqueness. You can have 100% uniqueness in 8 digits for 1,099,511,627,776 numbers by using a sequence and then base32 encoding it. Any other method that does not involve checking against a list of past numbers only has odds equal to n/1,099,511,627,776 (where n=number of previous numbers generated) of not being unique.
Any algorithm will result in duplicates.
Therefore, might I suggest that you use your existing algorithm* and simply check for duplicates?
*Slight addition: If uniqid() can be non-unique based on time, also include a global counter that you increment after every invocation. That way something is different even in the same microsecond.
Without writing the code, my logic would be:
Generate a random string from whatever acceptable characters you like.
Then add half the date stamp (partial seconds and all) to the front and the other half to the end (or somewhere in the middle if you prefer).
Stay JOLLY!
H
If you use your original method, but add the username or emailaddress in front of the password, it will always be unique if each user only can have 1 password.
You may be interested in this article which deals with the same issue: GUIDs are globally unique, but substrings of GUIDs aren't.
The goal of this algorithm is to use the combination of time and location ("space-time coordinates" for the relativity geeks out there) as the uniqueness key. However, timekeeping is not perfect, so there's a possibility that, for example, two GUIDs are generated in rapid succession from the same machine, so close to each other in time that the timestamp would be the same. That's where the uniquifier comes in.
I usually do it like this:
$this->password = '';
for($i=0; $i<10; $i++)
{
if($i%2 == 0)
$this->password .= chr(rand(65,90));
if($i%3 == 0)
$this->password .= chr(rand(97,122));
if($i%4 == 0)
$this->password .= chr(rand(48,57));
}
I suppose there are some theoretical holes but I've never had an issue with duplication. I usually use it for temporary passwords (like after a password reset) and it works well enough for that.
As Frank Kreuger commented, go with a GUID generator.
Like this one
I'm still not seeing why the passwords have to be unique? What's the downside if 2 of your users have the same password?
This is assuming we're talking about passwords that are tied to userids, and not just unique identifiers. If that's what you're looking for, why not use GUIDs?
You might be interested in Steve Gibson's over-the-top-secure implementation of a password generator (no source, but he has a detailed description of how it works) at https://www.grc.com/passwords.htm.
The site creates huge 64-character passwords but, since they're completely random, you could easily take the first 8 (or however many) characters for a less secure but "as random as possible" password.
EDIT: from your later answers I see you need something more like a GUID than a password, so this probably isn't what you want...
I do believe that part of your issue is that you are trying to us a singular function for two separate uses... passwords and transaction_id
these really are two different problem areas and it really is not best to try to address them together.
I recently wanted a quick and simple random unique key so I did the following:
$ukey = dechex(time()) . crypt( time() . md5(microtime() + mt_rand(0, 100000)) );
So, basically, I get the unix time in seconds and add a random md5 string generated from time + random number. It's not the best, but for low frequency requests it is pretty good. It's fast and works.
I did a test where I'd generate thousands of keys and then look for repeats, and having about 800 keys per second there were no repetitions, so not bad. I guess it totally depends on mt_rand()
I use it for a survey tracker where we get a submission rate of about 1000 surveys per minute... so for now (crosses fingers) there are no duplicates. Of course, the rate is not constant (we get the submissions at certain times of the day) so this is not fail proof nor the best solution... the tip is using an incremental value as part of the key (in my case, I used time(), but could be better).
Ingoring the crypting part that does not have much to do with creating a unique value I usually use this one:
function GetUniqueValue()
{
static $counter = 0; //initalized only 1st time function is called
return strtr(microtime(), array('.' => '', ' ' => '')) . $counter++;
}
When called in same process $counter is increased so value is always unique in same process.
When called in different processes you must be really unlucky to get 2 microtime() call with the same values, think that microtime() calls usually have different values also when called in same script.
I usually do a random substring (randomize how many chars between 8 an 32, or less for user convenience) or the MD5 of some value I have gotten in, or the time, or some combination. For more randomness I do MD5 of come value (say last name) concatenate that with the time, MD5 it again, then take the random substring. Yes, you could get equal passwords, but its not very likely at all.

Categories