Using PHP, I have a MySQL database with an Actions table, in which a user optionally assigns actions to some pages in their website. Each such assignment results in an action row, containing (among other things) a unique ActionId and the URL of the appropriate page.
Later on, when in a context of a specific page, I want to find out if there is an action assigned to that page, and fetch (SELECT) the appropriate action row. At that time I know the URL of my page, so I can search the Actions table, by this relatively long string. I suspect this is not an optimal way to search in a database.
I assume a better way would be to use some kind of hashing which converts my long URL strings into integers, making sure no two different URLs are converted into the same integer (encryption is not the issue here). Is there such a PHP function? Alternatively, is there a better strategy for this?
Note I have seen this: SQL performance searching for long strings - but it doesn't really seem to come up with a firm solution, apart from mentioning md5 (which hashes into a string, not to integer).
The hashing strategy is a good strategy.
Dealing with the URL strings might indeed be a problem, because they can be very long, and contain a lot of special chars, which are always problematic for MySQL search (REGEXP or LIKE).
That is why hashing solves the problem. Even md5 which is not a good hashing function to hash passwords (because it's not secure anymore), is good to hash URL.
This way you will have http://www.stackoverflow.com changed into 4c9cbeb4f23fe03e0c2222f8c4d8c065, and that will be pretty much unique (unless you are very very unlucky).
Once you have your md5_url field set up, you can search with :
SELECT * FROM Actions where md5_url=?
Where the ? is an md5($url) of current URL.
Of course be sure to set an index on your md5_url field :
ALTER TABLE Actions
ADD md5_url varchar(32),
ADD KEY(md5_url);
If you add an index to the column, the database should take care of efficiency for you, and the length of the URL should make no difference.
Related
What would be useful solutions for hiding true database object ID in URL for security purposes? I found that one of the solutions would be:
1) Using hashids open source project
2) Using something like same old md5 on creation of the object to generate hash and store it in database, then use it in url's and querying by them, but the drawback is that querying by auto-incremented primary keys (IDs) is faster than hashes. So I believe the possibility to hash/unhash would be better?
Also as I'm on Symfony, are there maybe bundles that I could not find or built in functionalities that would help?
Please tell me what you found useful based on your experiences.
This question has been asked a lot, with different word choice (which makes it difficult to say, "Just search for it!"). This fact prompted a blog post titled, The Comprehensive Guide to URL Parameter Encryption in PHP .
What People Want To Do Here
What People Should Do Instead
Explanation
Typically, people want short random-looking URLs. This doesn't allow you much room to encrypt then authenticate the database record ID you wish to obfuscate. Doing so would require a minimum URL length of 32 bytes (for HMAC-SHA256), which is 44 characters when encoded in base64.
A simpler strategy is to generate a random string (see random_compat for a PHP5 implementation of random_bytes() and random_int() for generating these strings) and reference that column instead.
Also, hashids are broken by simple cryptanalysis. Their conclusion states:
The attack I have described is significantly better than a brute force attack, so from a cryptographic stand point the algorithm is considered to be broken, it is quite easy to recover the salt; making it possible for an attacker to run the encoding in either direction and invalidates property 2 for an ideal hash function.
Don't rely on it.
Quote from the site:
Do you have a question or comment that involves "security" and "hashids" in the same sentence? Don't use Hashids.
I'd use true encryption algorithm, like function openssl_encrypt (for example), or something like this. And encrypt ids when passing outside, decrypt when using in your code (like for db queries).
And I won't recommend storing ids in a base like any kind of encrypted "garbage", in my opinion its very inconvenient to hash your real ids. Keep it clean and pretty inside and encrypt for external display only.
Following your idea, you just need to cipher your IDs before writing the URL to HTML page and decipher them when processing those URLs.
If you want just security by obscurity, which is sufficient for, maybe 99% of curious people out there who likes to iterate over IDs in URLs, you use something simple like base64 or rot13. Of course, you can also precalculate those "public IDs" and store in the database, not encrypting each time the URL is being shown to end user.
If you want true security you have to encrypt them with some serious asymmetric cypher, storing both keys at your side, as you essentially talking with yourself and don't want a man-in-the-middle attack. This you will not be able to precalculate as at each encrypting there'll be different cyphertext, which is good for this cause.
In any case, you need something two-way, so if I were you I'd forget about word "hash", hashes are for purposes different from yours.
EDIT:
But the solution which every blog out there uses for this task for several years already is just to utilize URL rewriting, converting, in your case, URLs like http://example.com/book/5 to URLs like http://example.com/rework-by-37signals. This will completely eradicate any sign of database ID from your URL.
Ideologically, you will need something which will uniquely map the request URL to your database content anyway. If you hide MySQL database IDs behind any layer of URL rewriting, you'll just make this rewritten URL a new ID for the same content. All you gain is protection from enumeration attacks and maybe SEF URLs.
What would be useful solutions for hiding true database object ID in URL for security purposes? I found that one of the solutions would be:
1) Using hashids open source project
2) Using something like same old md5 on creation of the object to generate hash and store it in database, then use it in url's and querying by them, but the drawback is that querying by auto-incremented primary keys (IDs) is faster than hashes. So I believe the possibility to hash/unhash would be better?
Also as I'm on Symfony, are there maybe bundles that I could not find or built in functionalities that would help?
Please tell me what you found useful based on your experiences.
This question has been asked a lot, with different word choice (which makes it difficult to say, "Just search for it!"). This fact prompted a blog post titled, The Comprehensive Guide to URL Parameter Encryption in PHP .
What People Want To Do Here
What People Should Do Instead
Explanation
Typically, people want short random-looking URLs. This doesn't allow you much room to encrypt then authenticate the database record ID you wish to obfuscate. Doing so would require a minimum URL length of 32 bytes (for HMAC-SHA256), which is 44 characters when encoded in base64.
A simpler strategy is to generate a random string (see random_compat for a PHP5 implementation of random_bytes() and random_int() for generating these strings) and reference that column instead.
Also, hashids are broken by simple cryptanalysis. Their conclusion states:
The attack I have described is significantly better than a brute force attack, so from a cryptographic stand point the algorithm is considered to be broken, it is quite easy to recover the salt; making it possible for an attacker to run the encoding in either direction and invalidates property 2 for an ideal hash function.
Don't rely on it.
Quote from the site:
Do you have a question or comment that involves "security" and "hashids" in the same sentence? Don't use Hashids.
I'd use true encryption algorithm, like function openssl_encrypt (for example), or something like this. And encrypt ids when passing outside, decrypt when using in your code (like for db queries).
And I won't recommend storing ids in a base like any kind of encrypted "garbage", in my opinion its very inconvenient to hash your real ids. Keep it clean and pretty inside and encrypt for external display only.
Following your idea, you just need to cipher your IDs before writing the URL to HTML page and decipher them when processing those URLs.
If you want just security by obscurity, which is sufficient for, maybe 99% of curious people out there who likes to iterate over IDs in URLs, you use something simple like base64 or rot13. Of course, you can also precalculate those "public IDs" and store in the database, not encrypting each time the URL is being shown to end user.
If you want true security you have to encrypt them with some serious asymmetric cypher, storing both keys at your side, as you essentially talking with yourself and don't want a man-in-the-middle attack. This you will not be able to precalculate as at each encrypting there'll be different cyphertext, which is good for this cause.
In any case, you need something two-way, so if I were you I'd forget about word "hash", hashes are for purposes different from yours.
EDIT:
But the solution which every blog out there uses for this task for several years already is just to utilize URL rewriting, converting, in your case, URLs like http://example.com/book/5 to URLs like http://example.com/rework-by-37signals. This will completely eradicate any sign of database ID from your URL.
Ideologically, you will need something which will uniquely map the request URL to your database content anyway. If you hide MySQL database IDs behind any layer of URL rewriting, you'll just make this rewritten URL a new ID for the same content. All you gain is protection from enumeration attacks and maybe SEF URLs.
I need help on coming up with a strategy to handle object ids in a PHP/MySQL application I'm working on. Basically, instead of having a URL look like this:
/post/get/1
I'm looking for something like:
/post/get/92Dga93jh
I know that security-through-obscurity is useless (I have an ACL system in place to handle security) but I still need to obscure the ids. This is where I'm stuck.
I thought about generating a separate public id for each DB row but have been unable to find a way to create truly unique ids.
I suppose I could encrypt and decrypt a MySQL auto increment row id as it leaves and enters my app, but I'm not sure how 'expensive' PHP's encryption and decryption methods are. Additionally, I need to make sure that the obscured id remains unique so that it doesn't decrypt into the wrong value.
Also, since my domain objects are related to each other, I want to avoid any unnecessary strain on MySQL if I decide to go with generating and storing an obscure id in the tables.
I'm beating my head against the wall because I feel like this is a common scenario, yet can't figure out what to do. Any help is greatly appreciated!
I'd just use a salted md5. It's secure for 99% of the cases. The other 1% will be when you are wacking your head on the wall cause you got your data stolen by a pro-hacker and it becomes critical to minimize the impact of it.
So:
$sql = 'SELECT * FROM my_table WHERE MD5(CONCAT(ID, "mysupersalt")) = "'.$my_checked_url_value.'"';
And generating the same thing from PHP can be done using similar strategy:
link text
Hope this is what you're looking for..
As long as you given 9-char base62 string - you could follow this strategy:
Generate a number from 1 to 13537086546263552 (62 ^ 9)
Convert it to the base62 string
Try to insert to the database (you're supposed to have the unique index over id field)
If ok - do nothing
If not ok - repeat 1-3
Use a one-way hash like md5, etc.
Depends on the application really, if its super essential that you have IDs from which the user can never 'guess' the original IDs, then use a recursive call to db to generate a unique public ID.
If on the other hand, you just need the IDs to look different without any security worries if someone can 'guess' the original ID, and are concerned with the performance, you can come up with a quick and basic math equation to generate a unique id on the fly and decode it as well when the URL is accessed.
(I know its a HACK, but gets the job done for a lot of cases)
E.g. If I access /blog/id/x!1#23409235 (which means /blog/id/1)
In the code, I can decode above by:
$blogId = intval(substr($_GET['id'], 4)) - 23409234;
and of course, while generating the URL, you add 23409234 to the original URL's id and prefix it with some random char bits..
Oh and you can use Apache's mod_rewrite to do all these calculations.
The probably easiest way is checking whether there is already such a record in a
do {
$id = generateID();
}
while(idExists($id));
loop. There shouldn't be to many duplicate IDs so in most cases there are only two queries most the time: Checking and Inserting.
Noob question here. I'm overhauling some "Search" pages in a real estate website. I would like to be able to generate an unique ID (hash?) which contains in itself all the parameters of the search, e.g., the user would be given an URL in the form of http://search.example.com/a95kl53df-02, and loading this URL would repeat the exact same search.
Some of the search parameters are simply one of several options, some are integers, and there are also keywords (which I'll just append after the ID, I guess). What's the general approach to cramming this data into a string? I'm fairly comfortable with PHP/MySQL, but my practical experience is next to none, so I don't know "how it's done".
EDIT: I do not need the string to be random, and, indeed, I need the process to be two-way. Perhaps hash isn't the correct term, then. As for why - I'm doing this for the sake of brevity, since current URLs contain at least 22 GET parameters.
I have the nasty habit of always asking my questions on the Interwebs a bit too early, reconsiderations popping right into my head as soon as I have posted. I'm currently drafting a possible solution. I'm still open to any suggestions, though.
Hashes are not unique
A hash is NOT unique, you can't use it. Any hash can result from an infinite number of given strings.
You don't need randomness, just a unique token
You should just generate a unique token with the help of the database (even just an autoindexed id). You can create a cronjob that deletes old searches after a while.
That table would minimally contain the unique token plus the original search string.
Possible implementation
User does a search
Search params are stored in database, token is returned
Token is given to user in some way (e.g. do you want to save this search for later)
When user wants to repeat search with token, search string is retrieved from db and search run
You could use something like mcrypt() on $_SERVER['QUERY_STRING'], and then decrypt it if an encrypted URL is passed in. However, there are all sorts of problems here and I recommend not doing that.
Based on your edit that you are doing this because of a complicated URL, I would suggest that hashing is going to make the problem worse. If you have an error with the URL, you now have multiple places it could be going wrong.
Just make a random key that you then lookup in a simple flat-file database. You could check whether the URL is already in the database and then return the key if it is.
Another advantage of this system is that if your URL structure changes, then you can change all the URLs in the database and the users' short URLs still work.
Well to be random (which by the way you never can be), you can hash let us say the microtime (which is random-sh, since there is a low possibility that 2 users will search at the same time) along with some salt, with what you can use is the query id:
so something like:
$store_unique = md5(microtime().$queryID);
//the $store_unique you can save to the db with the query params
//then when anyone goes to the random url, you can check it against the db
UPDATE
Due to the comments below, I offer another solution (which can be more unique):
$store_unique = microtime(). "-" .$queryID;
//the $store_unique you can save to the db with the query params
//then when anyone goes to the random url, you can check it against the db
I wish to be able to generate URL variables like this:
http://example.com/195yq
http://example.com/195yp
http://example.com/195yg
http://example.com/195yf
The variables will match to a MySQL record set so that I can pull it out. At the time of creation of that record set, I wish to create this key for it.
How can I do this? What is the best way to do this? Are there any existing classes that I can make use of?
Thanks all
Basically, you need a hash function of some sort.
This can be SHA-1 function, an MD5 function (as altCogito), or using other data you have in the record and encoding it.
How many records do you think you will have? Be careful on selecting the solution as a hash function has to be big enough to cover you well, but too big and you create a larger database than you need. I've used SHA-1 and truncate to 64 or 32 bits, depending on need.
Also, look at REST. Consider that the URL may not have to be so mysterious...
This can be done pretty straightforward in a couple of ways:
Use a hash, such as MD5. (would be long)
Base-64 encode a particular ID or piece of data
If you're asking whether or not there is anything that does this already with MySQL records, I'm not sure that you'll find something as design-wise, data is really, really conceptually far away from the URLs in your website, even in frameworks like Grails. Most don't even attempt to wrap up front-end and data-access functionality into a single piece.
In what context will you be using this? Why not just pass post variables? It is far more secure and neat. You will still accomplish the number in the url such as id=195yq, and there is a way to hide them by configuring your php.ini file.
I hope this can be of help to you.
Please keep this in mind. When you pass variables in the address bar it is easy for someone to change the variable, and access information you may not want them to access.
In the examples you gave, it looks like you're base-64 encoding the numeric primary key. That's how I would do it and have done it in the past, but I'm not sure it's any better than passing the ID in the clear because base-64 decoding is trivial.