Should I use "id" or "unique username"? - php

I am using PHP, AS3 and mysql.
I have a website. A flash(as3) website. The flash website store the members' information in mysql database through php. In "members" table, i have "id" as the primary key and "username" as a unique field.
Now my situation is:
When flash want to display a member's profile. My questions:
Should Flash pass the member "ID" or "username" to php to process the mysql query?
Is there any different passing the "id" or "username"?
Which one is more secure?
Which one you recommend?
I would like to optimize my website in terms of security and performance.

1) Neither is inarguably the thing it should do.
2) The ID is probably shorter and minisculely faster to look up. The ID gives away slightly more information about your system; if you know that a site uses serial IDs at all, and you know what one of them is, that's pretty much as good as knowing all of them, whereas knowing one username does not tell you the usernames of any other users. On the other hand, the username is more revelatory of the user's psychology and may constitute a password hint.
3) Both have extremely marginal downfalls, as described in item 2.
4) I'd use the ID.

The primary key is always the safest method for identifying database rows. For instance, you may later change your mind and allow duplicate usernames.
Depending on how your ActionScript is communicating with PHP, it will likely also require sending fewer bytes if you send an integer ID in your request rather than a username.

Arguments for passing id number:
People never change their id. People do change their names. For a casual games site with disposable accounts, that might not be a problem, but for long-term registered users it can be. I've had to handle a demand by an upset woman that her ex-husband's surname be purged from her user name. A process for doing this had to be rapidly established!
Shorter
Easier to index and partition.
Arguments for passing user name:
Slightly harder (but not impossible) to guess a legal, existing account - e.g. to peruse random people's records, if that's your thing.

Probably you should get intimately familiar with "PHP Sessions", maybe using a framework that already has this in place, because it's non-trivial and you don't want to mess it up. The session management software will then handle all this for you, including login screens, "I forgot my password", etc.
Then you can focus your attention on what your site is really primarily there for.
Sounds like fun (actionscript + php + mysql) - good luck!

Related

Stop user from getting data by changing id via URL

Showing user data on a page by query:
$query = "SELECT * FROM COLLECTIONS WHERE uid = $_GET['user_id']";
But problem is that user can see other users data by changing that uid.
How to solve this problem.
Take your website offline. NOW. Somebody is going to either wipe the data or steal the data or inject malware that's served to all of your customers
Breathe. You've bought yourself some time, assuming it hasn't already been breached.
A small subset of the security measures you NEED to take
These mitigate, in order of "has biggest immediate benefits" to "is probably most important", one problem each. (Apart from number 3, which mitigates anywhere from 4 to 32241 problems of equal or greater magnitude to number 1.)
Look through every instance of every database request, and make sure that you are never using double quotes or the . operator when defining your query string. Rebuild all of your database handling code to use some sort of parametrised SQL query system.
Use an authentication library, or at the very least a crypto library.
Ask about your setup on Security Stack Exchange using an account that is in no way traceable to your website. Not even to your company, if your website is associated with your company.
Why?
Yes, I know, that website is probably important and needs to stay up so people can use it. But try this:
www.badwebsite.com/your/page/here?uid=1 OR 1
All of the data is visible! You are accepting code from the user and running it in your database. Now what if I decided to delete all of your database tables?
That's just covering the first point I made. Please trust that there are bigger problems for your users if you haven't done step 2, the least of which is hundreds of their accounts on other websites (e.g. Gmail, Example Bank) becoming known to cyber criminals.
Take a look at this comic strip:
There's also a unicode-handling bug in the URL request library, and we're storing the passwords unsalted ... so if we salt them with emoji, we can close three issues at once!
This is made to be more funny, but the problem described in this comic strip is probably less bad than the problem you are facing. Please, for the sake of whoever has entrusted you with their data, turn it off for a few days whilst you try to make it something resembling secure.
You might want to bring in a technical consultant; if your developers are not experienced in creating intrusion-proof software then they're probably not up to the task of making insecure software secure (which is orders of magnitude harder, especially if you're new to that sort of thing).

Randomising database for insert

Evening all, I've recently been reading the following blog post about sharding at Pinterest and I think there's some great stuff in there https://engineering.pinterest.com/blog/sharding-pinterest-how-we-scaled-our-mysql-fleet
What I'm unsure on though, is how best to decide where a brand new user should be inserted.
So for those that don't know or have bothered to read the above article, Pinterest have a number of shards, each with a number of databases on. They generate IDs for objects based on a 64 bit shifting that determines a shard, the type of object (user,pin etc..) to determine a table and the local auto-increment id for the object in question. Now they try to put pins etc. on the same database as the 'board' they are on. But for a brand new object, what would be the best way of determining the 'shard' it lives on?
For users that sign in via Facebook they use a modulus e.g
shard = md5(“1.2.3.4") % 4096 //4096 is the number of shards
But if I had a simple email/password registration form, do you think using a similar approach on email address would work for working out an initial shard? I'd assume it would have to be email in this case, otherwise they would have no way of knowing what database to validate the logging credentials against. Also I know that post is from 2015 so not too old and computing power moves quickly, but would there be a better option then using md5 here? I know the chance of a collision is minor - especially as we're just talking about hashing the email address here, but would it be worth using a different algorithm? I'm basically interested in the best way to determine a shard here and to work out how to get back to it (hence why I think it has to be email address)
Hope this all makes sense!
(p.s didn't take this with the Pinterest tag as it looks like that's just for api dev, but if someone thinks it might get better 'eyes' on the question then feel free to add it)
When using MD5 to determine the shard, there is no risk on collisions: If collisions occur then it just ends up in the same shard. The MD5 is not the key in that shard (so that is where the collision risk is removed).
The main issue in this shard method is that the number of shards is fixed, so performance in the end might be an issue (re-distributing a running environment is not easy, so in this design you are still dependent on faster machines if there is more growth then expected).

Why use IDs for looking up data as opposed to using the title -- please help me argue my point

Please help me argue my point.
I am working on a website project with a team of developers, we are developing the system in 3 parts. The one part is the API, 2 back-end and front-end. Both the front end and back-end gets and stores data by sending it to the API.
I am specifically responsible for the front end. I am using Codeigniter as my framework.
A little background: The app is a sports betting site.
This is the problem: The developers of the API use the name of for example a tournament or fixture or sport to do the lookup, I pass the name of a tournament for example:
www.example.com/sport/add_bet/{tournament_name}
The problem I have with this is that the tournament name as entered into the system by humans might have characters such as spaces, forward slashes, etc in the name.
As you can imagine using a forward slash in the url will completely break the system, since we use them to call different controllers, actions and to pass variables.
I am trying to get them to change to using a simple primary key id field, to perform the lookup of the data. For some reason these developers don't want to do this.
The project manager that manages this project (not a programmer and no experience of programming) had a chat to them about this issue, but still they don't want to change, and they told her that it is a matter of personal preference on which way to go.
As far as I know ID's have always been the way to do it.
Could you guys/girls please help me argue my point by giving some reasons as to why I am correct or incorrect in your view. I would like to provide your answers as motivation to get them to change over to doing it the right way.
Your help/answers/suggestions would be much appreciated.
The most important thing is the id will be unique as it is should be the primary key. so searching by ids will return unique results.
But the multiple record may have save title if you didn't validate them at the time of saving.
And also if you want some joins or something like that the id would help it.
And the should never trust the user and expect them to work as you wanted.
There is two sides:
1) You allow select single Title from dropdown and send to server only ID. Look-up by ID is way faster (assuming you are using ID as primary key). But if you have lots of Titles than you have to list all of them and user will be forced to scroll till find that Title.
2) You have simple input field to allow search only by part of Title. That way you don't have to list all Titles. As programmer, you have to escape all user input, that goes to server (via GET or POST), so that user can input even DELETE FROM user WHERE 1 to your input field and your system will sill works fine. Also, by inputting only part of Title allow to show multiple results, while using IDs is impossible.
I prefer second approach.
To make the look up fast, you need to place an index on the column by which you are looking up records. Primary key column always has an index. In order to use some other column you need to add an unique index, to avoid duplicates and make the search faster, which in turn makes the table larger. If you expect the table to grow (which is not too unlikely if you follow many sports and many leagues/tournaments over a number of years), it might become a problem at some point, depending on the resources in your production environment. It's not the strongest argument you can present, but it is not a bad argument either

Redis CRUD patterns

i've recently started learning Redis and am currently building an app using it as sole datastore and I'd like to check with other Redis users if some of my conclusions are correct as well as ask a few questions. I'm using phpredis if that's relevant but I guess the questions should apply to any language as it's more of a pattern thing.
As an example, consider a CRUD interface to save websites (name and domain) with the following requirements:
Check for existing names/domains when saving/validating a new site (duplicate check)
Listing all websites with sorting and pagination
I have initially chosen the following "schema" to save this information:
A key "prefix:website_ids" in which I use INCR to generate new website id's
A set "prefix:wslist" in which I add the website id generated above
A hash for each website "prefix:ws:ID" with the fields name and website
The saving/validation issue
With the above information alone I was unable (as far as I know) to check for duplicate names or domains when adding a new website. To solve this issue I've done the following:
Two sets with keys "prefix:wsnames" and "prefix:wsdomains" where I also SADD the website name and domains.
This way, when adding a new website I can check if the submitted name or domain already exist in either of these sets with SISMEMBER and fail the validation if needed.
Now if i'm saving data with 50 fields instead of just 2 and wanted to prevent duplicates I'd have to create a similar set for each of the fields I wanted to validate.
QUESTION 1: Is the above a common pattern to solve this problem or is there any other/better way people use to solve this type of issue?
The listing/sorting issue
To list websites and sort by name or domain (ascending or descending) as well as limiting results for pagination I use something like:
SORT prefix:wslist BY prefix:ws:*->name ALPHA ASC LIMIT 0 10
This gives me 10 website ids ordered by name. Now to get these results I came to the following options (examples in php):
Option 1:
$wslist = the sort command here;
$websites = array();
foreach($wslist as $ws) {
$websites[$ws] = $redis->hGetAll('prefix:ws:'.$ws);
}
The above gives me a usable array with website id's as key and an array of fields. Unfortunately this has the problem that I'm doing multiple requests to redis inside a loop and common sense (at least coming from RDBMs) tells me that's not optimal.
The better way it would seem to be to use redis pipelining/multi and send all request in a single go:
Option 2:
$wslist = the sort command here;
$redis->multi();
foreach($wslist as $ws) {
$redis->hGetAll('prefix:ws:'.$ws);
}
$websites = $redis->exec();
The problem with this approach is that now I don't get each website's respective ID unless I then loop the $websites array again to associate each one. Another option is to maybe also save a field "id" with the respective website id inside the hash itself along with name and domain.
QUESTIONS 2/3: What's the best way to get these results in a usable array without having to loop multiple times? Is it correct or good practice to also save the id number as a field inside the hash just so I can also get it with the results?
Disclaimer: I understand that the coding and schema building paradigms when using a key->value datastores like Redis are different from RDBMs and document stores and so notions of "best way to do X" are likely to be different depending on the data and application at hand.
I also understand that Redis might not even be the most suitable datastore to use in mostly CRUD type apps but I'd still like to get any insights from more experienced developers since CRUD interfaces are very common on most apps.
Answer 1
Your proposal looks pretty common. I'm not sure why you need an auto-incrementing ID though. I imagine the domain name has to be unique, or the website name has to be unique, or at the very least the combination of the two has to be unique. If this is the case it sounds like you already have a perfectly good key, so why invent an integer key when you don't need it?
Having a SET for domains and a SET for website names is a perfect solution for quickly checking to see if a specific domain or website name already exists. Though, if one of those (domain or website name) is your key you might not even need these SETs since you could just look if the key prefix:ws:domain-or-ws-name-here exists.
Also, using a HASH for each website so you can store your 50 fields of details for the website inside is perfect. That is what hashes are for.
Answer 2
First, let me point out that if your websites and domain names are stored in SORTED SETs instead of SETs, they will already be alphabetized (assuming they are given the same score). If you are trying to support other sort options this might not help much, but wanted to point it out.
Your Option 1 and Option 2 are actually both relatively reasonable. Redis is lightning fast, so Option 1 isn't as unreasonable as it seems at first. Option 2 is clearly even more optimal from the perspective of redis since all the commands will be bufferred and executed all at once. Though, it will require additional processing in PHP afterwards as you noted if you want the array to be indexed by the id.
There is a 3rd option: lua scripting. You can have redis execute a Lua script that returns both the ids and hash values all in one shot. But, not being super familiar with PHP anymore and how redis's multibyte replies map to PHPs arrays I'm not 100% sure what the lua script would look like. You'll need to look for examples or do some trial and error. It should be a pretty simple script, though.
Conclusion
I think redis sounds like a decent solution for your problem. Just keep in mind the dataset needs to always be small enough to keep in memory. If that's not really a concern (unless your fields are huge, you should be able to fit thousands of websites into only a few MB) or if you don't mind having to upgrade your RAM to grow your DB, then Redis is perfectly suitable.
Be familiar with the various persistence options and configurations for redis and what they mean for availability and reliability. Also, make sure you have a backup solution in place. I would recommend having both a secondary redis instance that slaves off of your main instance, and a recurring process that backs up your redis database file at least daily.

PHP - Is this a good method to prevent re-submission?

This is related to preventing webform resubmission, however this time the context is a web-based RPG. After the player defeats a monster, it would drop an item. So I would want to prevent the user from hitting the back button, or keep refreshing, to 'dupe' the item-drop.
As item drop is frequent, using a DB to store a unique 'drop-transaction-id' seems infeasible to me. I am entertaining an idea below:
For each combat, creating an unique value based on the current date-time, user's id and store it into DB and session. It is possible that given a userid, you can fetch the value back
If the value from session exists in the DB, then the 'combat' is valid and allow the user to access all pages relevant to combat. If it does not exist in DB, then a new combat state is started
When combat is over, the unique value is cleared from DB.
Values which is 30mins old in the DB are purged.
Any opinions, improvements, or pitfalls to this method are welcomed
This question is very subjective, there's things you can do or can not do, depending on the already existing data / framework around it.
The solution you've provided should work, but it depends on the unique combat/loot/user data you have available.
I take it this is what you think is best? It's what I think is best :)
Get the userID, along with a unique piece of data from that fight. Something like combat start time, combat end time, etc
Store it in a Database, or what ever storage system you have
Once you collect the loot, delete that record
That way if the that userID, and that unique fight data exists, they haven't got their loot.
And you are right; tracking each piece of loot is too much, you're better off temporarily storing the data.
Seems like a reasonable approach. I assume you're storing the fact that the player is in combat somewhere anyway. Otherwise, they can just close their browser if they want to avoid a fight?
The combat ending and loot dropping should be treated as an atomary operation. If there is no fight, there can't be any dropping loot.
That depends on your game design: Do you go more in the direction of roguelikes where only turns count, and therefore long pauses in between moves are definitely possible (like consulting other people via chatroom, note: in NetHack that is not considered cheating)? Can users only save their games on certain points or at any place? That makes a huge difference in the design, e.g. making way for exploits similar to the one Thorarin mentions.
If your game goes the traditional roguelike route of only one save, turn basement and permadeath, then it would be possible to save the number of the current turn for any given character along with any game related information (inventory, maps, enemies and their state), and then check against that at any action of the player, therefore to prevent playing the turn twice.
Alternatively you could bundle everything up in client side javascript, so that even if they did resubmit the form it would generate an entirely new combat/treasure encounter.

Categories