How do large sites accomplish row-level permissions?

How do large sites accomplish row-level permissions? - php

So I am making a small site using cakephp, and my ACL is set up so that every time a piece of content is created, an ACL rule is created to link the owner of the piece of content to the actual content. This allows each owner to edit/delete their own content. This method just seems so inefficient, because there is an equivalent amount of ACL rules as content in the database. I was curious, how do big sites, with millions of pieces of content, solve this problem?

With large sites I have worked on access permissions were determined at the application level. The database associated the content with a user's record then in the data access/business logic layer it made the determination whether or not the user has sufficient rights to access the content.
For a large site with dynamic content I think this would probably be the best way to handle it.
EDIT: To add a more concrete example.
Example:
Ok lets say we have a simple file storage site where a user can only access their data or data that has been explicitly shared with them by another user.
Since this application is fairly simple as it is just serving files it only has three database tables which are:
Users Table which has columns:
UserId <int> PK
UserName <varchar>
HashedPassword <varchar>
Files Table which has columns:
FileId <int> PK
FileOwnerId <int> FK (this has a foreign key relationship with UserId in the users table)
FileName <varchar>
MimeType <varchar>
FileData <blob>
SharedFile reference table which has columns:
SharedFileIndex <int> PK
FileId <int> FK
UserId <int> FK
Now some basic rules that we will want to define in our data access layer is that when a user is logged in they can access files that they are the owner of and files that other users have shared with them. So either through stored procedures or building the query to send to the database server I would make sure that my queries only return those records which they have access to.
Here the basic GetUsersFileList sql query for when a user logs in:
SELECT FileId, FileName, FileType
FROM Files
WHERE FileOwnerId = #UserId
As you can see here we are using a parameterized query to get then files a user is the owner of. Additionally we would query for the shared files as well for displaying to the user.
Now if we assume that each file will have it's own unique url such as:
http://mydomain.com/filehandler.php?fileId=123546
Then when we try to get the file we use a similar query as above to try and get the file data:
SELECT FileName, FileType, FileData
FROM Files
LEFT OUTER JOIN SharedFiles on Files.FileId = SharedFiles.FileId
WHERE Files.FileId = #FileId AND (FileOwnerId = #UserId OR SharedFiles.UserId = #UserId)
So you see when we attempt to get the file we are still using the UserId in the query thus if the user does not have the file either shared with them or they are not the owner of the file the result from the query will be 0 rows.
So permissions are determined by what a user is mapped to in the database but the actual enforcement is done by carefully writing your data access code and/or additional checks in your business logic layer before serving the content.
EDIT2: I am most familiar with MSSQL so my queries above are in T-SQL so the syntax might be a little off for MySql.
EDIT3: Replaced business logic layer with data access layer as in this example the only checks that are made is within the data access queries themselves.
EDIT4: Ok put back in reference to the business logic layer as more complex apps would need more complex permission schemes which could necessitate additional checks in the business logic layer.

Instead of having a separate ACL for each content element, you can have a separate ACL for each different set of permissions. Most content items for a given user will have the same permissions, so they can all point to the same ACL. This could also allow you to cache permission checks (e.g. "user 123 has permission to read ACL 456"). In the end you will have very few ACLs -- just all the standard ones and the few exceptions.

Same rule applies to both large and small sites - if you want more specific control, you have to store more data in database. The problem you are trying to solve [allow users to manage only their content] can be solved using simple user id link between tables [for example users.id <-> articles.userId], there's no need to link every row with user. I would suggest using more general rules and storing only exceptions [for example allowing specified users to edit other users content] as an external data.

Related

DatabasePHP: Created user A accesses Database of created user B and vice versa

I developed a website in php script and connected it to a database. There I set up various user accounts for customers. Now my problem is that every user I create always accesses the same database. For example, I create user A who has his own functions there, such as creating a customer account or creating a product for his shop. If I create user B he will access the same database or user B would have all the data of user A and vice versa. How can I set it up so that each user I have created has their own database and cannot access another database? The website is online and working as it should, except for this point.

What you're describing isn't really how databases work. I won't go into exactly how to do this, because there are many good resources online, but it's best to think of a database as a central store of information that everyone accesses. Instead of having each person have their own database, we generally associate each person with an identifier, either an ID or username. We then associate stored information with this identifier. For example, we might have a timesheets table with the columns:
id,punch_in,punch_out,employee_id
Then, when getting the time records for a given employee, you would do something like:
SELECT punch_in,punch_out FROM timesheets WHERE employee_id = MY_EMPLOYEE_ID
The above statement says: "Get all punches in and punches out from the timsheets table, as long as they match a certain employee ID." You can go much deeper than this, but this is an effective way to segment data records and keep everyone's data to themselves. If all you have is a user name, you can use something called a join, which merges two tables on a shared index.
There are many ways to achieve these goals, but the key takeaway of setting up a database is that each distinct type of thing should have its own table, for example a table for employees and one for timesheets, and if they relate to each other, they should share an index so you can associate them with each other.
You'll find lots of good resources here.

Although it is possible to set permissions at a more granular level, in MySQL and most DBMS it is simplest to do at the database level. Note that a single instance of MySQL can (and likely already does) have multiple databases. So your first task is to seperate the functionality out into different databases, then apply appropriate permissions for each user, making sure you don't alreay have permissions set against a wildcard username.

REST API Two seperate resources to create a user?

Currently building a REST API and one of the functions of it will be to create users. There are two ways my application will create users:
Register, users add themselves with the usual data: email, password, username, date of birth.
Manual creation, admin adds a user with usual data AND any extra data as required.
My setup is a users table, users_metadata table and users_permissions table, as well as a few others. The email and password are stored in the users table, the username and date of birth in the users_metadata table. When manually creating a user other metadata and the user's permissions, as well as data in the other tables, can be changed.
Would it be better to have two different resources to handle creating a user?

Would it be better to have two different resources to handle creating a user?
I wouldn't create two different resources that both represent the user and both model its creation process. Since a user is a user, in my opinion they should be created trough the same resource.
Manual creation, admin adds a user with usual data AND any extra data as required.
When manually creating a user other metadata and the user's permissions, as well as data in the other tables, can be changed.
If it makes sense, you could model this extra data as a separate (sub)resource. The same goes for permissions. This sub resource can then have its own URL (for instance /users/{id}/meta and /users/{id}/permissions) to which the client issues separate POST requests, or it can be nested in the data structure that is sent to the API, like so:
{
"name": "John",
"email-address": "john#doe.com",
"permissions": {
"read": true,
"write": false
},
"meta-data": {
"date-of-birth": "2000-01-01"
}
}
The approach with separate sub resources at their own URLs makes access control and validation a bit easier. On the other hand, it puts a bigger burden on the client. It can also put you in the position where an admin creates a user, the basic information is saved, but there is an error saving permissions; depending on your use case you may or may not need to somehow handle that automatically.
The approach where the sub resources are nested in the data structure makes the logic to handle the POST request a bit more complex, but it does make the client side of things easier and gives you the option to make the whole action atomic by wrapping it in a transaction and rolling back if anything goes wrong.
Note: These two approaches are not mutually exclusive; you can do both if you want.
Which of these approaches is best will depend on how many sub resources there are, how complex they are and how complex the access control to the sub resources is; the more sub resources there are and/or the more complex access control is, the more likely I would be to setup different URLs for the sub resources.
In this specific case, I would net the sub resources in the data structure and have the clients POST all the data at once.

Issues with storing images in filesystem vs DB

There are several questions with excellent answers on SO regarding the quintessential BLOB vs filesystem question. However, none of them seem to represent my scenario so I'm asking this.
Say there's a social network (a hypothetical scenario, of course) where users are all free to change anyone's profile picture. And each user's profile is stored in a MySQL table with the following schema:
ID [unsigned int, primary]
USERNAME [varchar(20)]
PROFILENAME [varchar(60)]
PROFILEPIC [blob]
Now, here's the thing: What if I want to store profile images as files on the server instead of BLOBs in the db? I can understand there will have to be some kind of naming convention to ensure all files have a unique name that also maps it to the primary key on the table for easy access. So, say, the primary key is the filename for the corresponding image stored on the disk. But in my context there could be simultaneous read/writes, and quite a lot of them. MySQL would typically handle that with no problems since it locks out the row while it's being updated. But how does one handle such situations in a filesystem model?

In your application layer, you could lock the block that does DB transaction and file IO to alleviate concurrency issues (lock example in C#).
Within this block, run your inserts/updates/deletes in a transaction. Follow that with adding/replacing/deleting the photo on disk. Let's write some pseudo-code:
lock (obj)
{
connection.StartTransaction();
connection.PerformAction();
if failed, return false;
photoMgmt.PerformAction();
if failed, return false;
connection.CommitTransaction();
}
Applying similar technique with PHP; additionally use flock to perform file locking.
In other words, commit to DB after committing to filesystem. If either DB or filesystem operation fails, perform cleansing so no change is saved.
I'd use bigint ID as the primary key and GUID filenames on disk. If users preferred the application to hold the name they provided, I'd create a field called user_filename to store the filename provided by the user, and for all other purposes I'd use the GUID.
Hopefully this will provide some direction.

User permissions; JSON-string or DB Table?

I am currently working with a medium-sized team developing a custom content management system for a large client. The CMS is written using PHP and follows the MVC pattern (custom). It is a modular system, for which plugins can be added to the system by us or other developers at a later stage.
The system will contain user-based permissions, and a series of generic roles that have predefined permissions. It is required that a super-admin user can also modify permissions on a user basis (for example John Doe might be defined as a regular user, but has the possibility of modifying content).
Opinion is currently divided about the best way for us to store and handle these permissions. Half of the dev team are suggesting to add a new DB table that will store key/value pairs and user IDs for each user, with boolean values stored in each record. The table structure would be something like this:
user_ID: the ID of the user
perm_name: the name of the permission
perm_value: a boolean value dictating whether the user can carry out this action
The proposal is that if the value associated with a particular permission is set to 0, or does not exist in the table, the user does not have the required permission.
The other half of the dev team is favouring storing the permissions in a single field as a JSON-encoded string within the users table. So for example, we would store the following JSON for John Doe):
{
'modifyProducts': 1,
'addProducts': 1,
'addPages': 0
}
We would then be able to use json_decode() within the User class to extract the permissions, for example:
$this->permissions = json_decode($dbval);
I am personally leaning towards the latter option for two main reasons:
It is scalable
It does not require us to modify the database if we need a new permissions.
In short, what is the best approach for such an application?

I think the best solution in this case would be to use NoSQL database, such as MongoDB - this way you can still keep the scalability and take advantage of the JSON structure.
On the other hand, depending on your user table you could take possible advantage of column type indexing and optimize your requests for querying and reading, if of course you're working with normalized database.
I personally would store JSON within a relational DB only when I want to directly display the info and not use it for any querying. Just like you've said yourself - there's always the possibility of ending up with huge and growing JSON string and this would most probably cause troubles at some point.

Database and Table Management

I have been creating a web app and am looking to expand. In my web app I have a table for users which includes privileges in order to track whether a user is an administrator, a very small table for a dynamic content section of a page, and a table for tracking "events" on the website.
Being not very experienced with web application creation, I'm not really sure about how professionals would create systems of databases and tables for a web application. In my web app, I plan to add further user settings for each member of the website and even a messaging system. I currently use PHP with a MySQL database that I query for all of my commands, but I would be willing to change any of this if necessary. What would be the best wat to track content such as messages that are interpersonal and also specific user settings for each user. Would I want to have multiple databases at any point? Would I want to have multiple tables for each user, perhaps? Any information on how this is done or should be done would be quite helpful.
I'm sorry about the broadness of the question, but I've been wanting to reform this web app since I feel that my ideas for table usage are not on par with those that experienced programmers have.

Here's my seemingly long, hopefully not too convoluted answer to your question. I think I've covered most, if not all of your queries.
For your web app, you could have a table of users called "Users", settings table called "UserSettings" or something equally as descriptive, and messages in "PrivateMessages" table. Then there could be child tables that store extra data that is required.
User security can be a tricky thing to design and implement. Do you want to do it by groups (if you plan on having many users, making it easier to manage their permissions), or just assign individually due to a small user base? For security alone, you'd end up with 4 tables:
Users
UserSettings
UserGroups
UserAssignedGroups
That way you can have user info, settings, groups they can be assigned to and what they ARE assigned to separated properly. This gives you a decent amount of flexibility and conforms to normalization standards (as mentioned above by DrSAR).
With your messages, don't store them with the username, but rather the User ID. For instance, in your PrivateMessages table, you would have a MessageID, SenderUserID, RecipientUserID, Subject, Body and DateSent to store the most basic info. That way, when a user wants to check their received messages, you can query the table saying:
SELECT * FROM PrivateMessages WHERE RecipientUserID = 123556
A list of tables for your messages could be as such:
PrivateMessages
MessageReplies
The PrivateMessages table can store the parent message, and then the MessageReplies table can store the subsequent replies. You could store it all in one table, but depending on traffic and possibly writing recursive functions to retrieve all messages and replies from one table, a two table approach would be simplest I feel.
If I were you, I'd sit down with a pencil and paper, and write down/draw what I want to track in my database. That way you can then draw links between what you want to store, and see how it will come together. It helps me when I'm trying to visualise things.

For the scope of your web app you don't need multiple databases. You do need, however, multiple tables to store your data efficiently.
For user settings, always use a separate table. You want your "main" users table as lean as possible, since it will be accessed (= searched) every time a user will try to log in. Store IDs, username, password (hashed, of course) and any other field that you need to access when authenticating. Put all the extra information in a separate table. That way your login will only query a smaller table and once the user is authenticated you can use its ID to get all other information from the secondary table(s).
Messages can be trickier because they're a bigger order of magnitude - you might have tens or hundreds for each user. You need to design you table structure based on your application's logic. A table for each user is clearly not a feasible solution, so go for a general messages table but implement procedures to keep it to a manageable size. An example would be "archiving" messages older than X days, which would move them to another table (which works well if your users aren't likely to access their old messages too often). But like I said, it depends on your application.
Good luck!

Along the lines of Cristian Radu's comments: you need to split your data into different tables. The lean user table will (in fact, should) have one unique ID per user. This (unique) key should be repeated in the secondary tables. It will then be called a foreign key. Obviously, you want a key that's unique. If your username can be guaranteed to be unique (i.e. you require user be identified by their email address), then you can use that. If user names are real names (e.g. Firstname Sirname), then you don't have that guarantee and you need to keep a userid which becomes your key. Similarly, the table containing your posts could (but doesn't have to) have a field with unique userids indicating who wrote it etc.
You might want to read a bit about database design and the concept of normalization: (http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html) No need to get bogged down with the n-th form of normalization but it will help you at this stage where you need to figure out the database design.
Good luck and report back ;-)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.