I would like to build an online logbook for truck drivers. The goal is that after a truck driver logs in, he/she immediately sees a snapshot of his/her driving total this year/month/day, together with some other totals also per year/month/day. So the information stored in the database is only relevant per user (truck driver). I personally don't require any statistical data out of the database as a whole (only per user).
Let's assume 10,000 users.
My question relates to the design of the mySQL database.
Since the information stored is only relevant per user and not in mass, does it makes sense to store the data in a table per user... leading up to 10,000 tables? Would that result in the most efficient/fastest database? OR should I dump all rows in one big 'Log' table, and have it relate to another table 'Users'.... even if analysis will only be done per user?
Here's some of the information that needs to be stored per user (ends up to about 30 columns):
Date - Truck make/model - Truck ID - Route # - From - To - Total time - Stops - Gas consumption - Night time - Crew (2nd driver) - ......
Simplified example here
User
user_id
first_name
last_name
Truck
truck_id
truck_make
truck_model
Route
route_id
user_id
truck_id
route_from
route_to
gas_consumption
Without anymore details to go on this is how I'd roll it.
I would suggest going for separate tables, but maybe in your case 1 large table is a good plan.
Assuming you write efficient MySQL to access the data you shouldn't have a problem with a large dataset such as the one you have described.
I'd take a look at MySQL / Rails Performance: One table, many rows vs. many tables, less rows? for more information on why going for the tables root may be a good idea. Also Which is more efficient: Multiple MySQL tables or one large table? contains some useful information on the subject.
You're describing a multi-tenant database. SO has a tag for that; I added it for you.
MSDN has a decent article giving you an overview of the issues involved in multi-tenant databases. The structures range from shared nothing to shared everything. Note carefully that in a "shared everything" structure, it's fairly easy for the database owner (probably you) to write a query that breaks user isolation and exposes one user's data to other users.
Related
So... assuming i have a database with three tables:
Table clients
Table data
and Table clients_to_data
And I have a API which allows Clients to Access data from Table data. Every client has a record in Table clients (with things like IP adress etc.) To log who accesses what, i'm logging in the table clients_to_data (which contains the ID for table clients, table data and a timestamp.)
Every time a user access my API, he get's logged in the clients_to_data table. (So records in clients and data are not updated, just read.)
I also want to be able to get the amount of hits per client. Pretty easy, just query the clients_to_data table with a client_id and count the results. But as my DB grows, i'll have tenthousands of records in the clients_to_data table.
And here's my question:
Is it a better practice to add a field "hits" to Table clients that stores the amount of hits for that user and increment it every time the user queries the API
So this would be adding redundancy to the DB which i've heard generally is a bad thing. But in this case i think it would speed up the process of retrieving the amount of hits.
So which method is better and faster in this case? Thanks for your help!
Faster when?
Appending to the table will be faster , than finding the record and updating it, much faster than reading it, incrementing and updating it.
However having hits "precalulated", will be faster than the aggregate query to count them.
What you gain on the swings you lose on the roundabouts, which choice you make depends on your current usage patterns. So are you prepared to slow down adding a hit, to gain a signicant boost on finding out how many you've had?
Obviously selecting a single integer column from a table will be faster then selecting a count() of rows from a table.
The complexity trade off is a bit moot. 1 way you need to write a more complex sql, the other way you will need to update/insert 2 tables in your code.
How often is the number of hits queried? Do you clients look it up, or do you check it once a month? If you only look now and then I probably wouldn't be too concerned about the time taken to select count(*).
If your clients look up the hit count with every request, then I would look at storing a hits column.
Now that our table structures are all clearly defined, lets get to work.
You want to record something in the DB which is the number of times every client has accessed the data, in other terms,
Insert a record into a table "client_to_data" for every clients "impression".
You are worried about 2 things,
1. Redundancy
2. Performance when retrieving the count
What about the performance when storing the count.(Insert statements)..?
This is a classic scenario, where I would write the data to be inserted into memcache, and do a bulk insert at the end of the day.
More importantly, I will normalize the data before inserting it to the DB.
As to select, create indexes. If its text, install sphinx.
Thanks.
I'm developing software for conducting online surveys. When a lot of users are filling in a survey simultaneously, I'm experiencing trouble handling the high database write load. My current table (MySQL, InnoDB) for storing survey data has the following columns: dataID, userID, item_1 .. item_n. The item_* columns have different data types corresponding to the type of data acquired with the specific items. Most item columns are TINYINT(1), but there are also some TEXT item columns. Large surveys can have more than a hundred items, leading to a table with more than a hundred columns. The users answers around 20 items in one http post and the corresponding row has to be updated accordingly. The user may skip a lot of items, leading to a lot of NULL values in the row.
I'm considering the following solution to my write load problem. Instead of having a single table with many columns, I set up several tables corresponding to the used data types, e.g.: data_tinyint_1, data_smallint_6, data_text. Each of these tables would have only the following columns: userID, itemID, value (the value column has the data type corresponding to its table). For one http post with e.g. 20 items, I then might have to create 19 rows in data_tinyint_1 and one row in data_text (instead of updating one large row with many columns). However, for every item, I need to determine its data type (via two table joins) so I know in which table to create the new row. My zend framework based application code will get more complicated with this approach.
My questions:
Will my solution be better for heavy write load?
Do you have a better solution?
Since you're getting to a point of abstracting this schema to mimic actual datatypes, it might stand to reason that you should simply create new table sets per-survey instead. Benefit will be that the locking will lessen and you could isolate heavy loads to outside machines, if the load becomes unbearable.
The single-survey database structure then can more accurately reflect your real world conditions and data input handlers. It ought to make your abstraction headaches go away.
There's nothing wrong with creating tables on the fly. In some configurations, soft sharding is preferable.
This looks like obvious solution would be to use document database for fast writes and then bulk-insert answers to MySQL asynchronously using cron or something like that. You can create view in the document database for quick statistics, but allow filtering and other complicated stuff only in MySQ if you're not a fan of document DBMSs.
I've two routes,
1) creating sub-tables for each user and storing his individual content
2) creating few tables and store data of all users in them.
for instance.
1) 100,000 tables each with 1000 rows
2) 50 Tables each with 2,000,000 rows
I wanna know which route is the best and efficient.
Context: Like Facebook, for millions of users, their posts, photos, tags. All this information is in some giant tables for all users or each user has it's own sub-tables.
This are some pros and cons of this two approaches in MySQL.
1. Many small tables.
Cons:
More concurrent tables used means more file descriptors needed (check this)
A database with 100.000 tables is a mess.
Pros:
Small tables means small indexes. Small indexes can be loaded entirely on memory, that means that your queries will run faster.
Also, because of small indexes, data manipulation like inserts will run faster.
2. Few big tables
Cons:
A huge table imply very big indexes. If your index cannot be entirely loaded on memory most of the queries will be very slow.
Pros:
The database (and also your code) it's clear and easy to mantain.
You can use partitioning if your tables became so big. (check this).
From my experience a table of two millions rows (I've worked with 70 millions rows tables) it's not a performance problem under MySQL if you are able to load all your active index on memory.
If you'll have many concurrent users I'll suggest you to evaluate other technologies like Elastic Search that seems to fit better this kind of scenarios.
Creating a table for each user is the worse design possible. It is one of the first things you are taught in db design class.
Table is a strong logical component of a database and hence it is used for many of the maintenance tasks by RDBMS. E.g. it is customary to set up table file space, limitations, quota, log space, transaction space, index tree space and many many other things. If every table gets it's own file to put data in, you'll get big roundtrip times when joining tables or whatsoever.
When you create many tables, you'll be having a really BIG overhead in maintenance. Also, you'll be denying the very nature of relational sources. And just suppose you're adding record to a database - creating a new table each time? It'd be a bit harder on your code.
But then again, you could try and see for yourself.
You should leverage the power of MySQL indexes which will basically provide something similar to having one table per user.
Creating one table called user_data indexed on user_id will (in a big picture) transform your queries having a where clause on user_id like this one:
SELECT picture FROM user_data WHERE user_id = INT
Into:
Look in the index to find rows from user_data where user_id = INT
Then, in this batch of rows load me the value of picture
By doing that, MySQL won't search into all rows from user_data but into the relevant one found in the index.
I'm working on an app in JavaScipt, jQuery, PHP & MySQL that consists of ~100 lessons. I am trying to think of an efficient way to store the status of each user's progress through the lessons, without having to query the MySQL database too much.
Right now, I am thinking the easiest implementation is to create a table for each user, and then store each lesson's status in that table. The only problem with that is if I add new lessons, I would have to update every user's table.
The second implementation I considered would be to store each lesson as a table, and record the user ID for each user that completed that lesson there - but then generating a status report (what lessons a user completed, how well they did, etc.) would mean pulling data from 100 tables.
Is there an obvious solution I am missing? How would you store your users progress through 100 lessons, so it's quick and simple to generate a status report showing their process.
Cheers!
The table structure I would recommend would be to keep a single table with non-unique fields userid and lessonid, as well as the relevant progress fields. When you want the progress of user x on lesson y, you would do this:
SELECT * FROM lessonProgress WHERE userid=x AND lessonid=y LIMIT 1;
You don't need to worry about performance unless you see that it's actually an issue. Having a table for each user or a table for each lesson are bad solutions because there aren't meant to be a dynamic number of tables in a database.
If reporting is restricted to one user at a time - that is, when generating a report, it's for a specific user and not a large clump of users - why not consider javascript object notation stored in a file? If extensibility is key, it would make it a simple matter.
Obviously, if you're going to run reports against an arbitrarily large number of users at once, separate data files would become inefficient.
Discarding the efficiency argument, json would also give you a very human-readable and interchangeable format.
Lastly, if the security of the report output isn't a big sticking point, you'd also gain the ability to easily offload view rendering onto the client.
Use relations between 2 tables. One for users with user specific columns like ID, username, email, w/e else you want to store about them.
Then a status table that has a UID foreign key. ID UID Status etc.
It's good to keep datecreated and dateupdated on tables as well.
Then just join the tables ON status.UID = users.ID
A good option will be to create one table with an user_ID as primary key and a status (int) each row of the table will represent a user. Accessing to its progress would be fast a simple since you have an index of user IDs.
In this way, adding new leassons would not make you change de DB
I have any number of users in a database (this could be 100, 2000, or 3) what i'm doing is using mysql "show tables" and storing the table names in an array, then i'm running a while loop and taking every table name (the user's name) and inserting it into some code, then i'm running said piece of code for every table name. With 3 users, this script takes around 20 seconds. It uses the Twitter API and does some mysql inserts. Is this the most efficient way to do it or not?
Certainly not!
I don't understand why you store each user in their table. You should create a users table and select from there.
It will run in 0.0001 seconds.
Update:
A table has rows and columns. You can store multiple users in rows, and information about each user in columns.
Please try some database design tutorials/books, they wil help you a great deal.
If your worried about storing multiple entries for each user within the same users table, you can have a seperate table for tweets with the tweet_id refering to the user.
I'd certainly go for one users table.
Databases are optimized for processing many rows; some of the techniques used are indexes, physical layout of data on disk and so on. Operations on many tables will be always be slower - this is just not what RDBMS were built to do.
There is one exception - sometimes you optimize databases by sharding (partitioning data), but this approach has as many advantages as disadvantages. One of the disadvantages is that queries like the one you described take a lot of time.
You should put all your users in one table, because, from logical point of view - they represent one entity.