I am currently working on a big management system (in PHP using Zend Framework, but that is not really revelant of the solution to this question) in which I have to manage multiple entries. Each entry has many fields and span on two tables in a 1-to-many relationship (through a single foreign key). There is roughly 50 fields in the first table and 30 fields in the second one.
I am now at the stage to implement a history tracking of the different modifications made by users (and some automated tasks). Each entry might enventually be rolled back partially or totally to a previous value.
I was thinking about using a system similiar to the one present in the CMS Typo3. One table to manage the whole history with the following fields
history_id
entry_id
entry_table
last_modifcation_timestamp
last_modification_user
data
The data would be "serialized" in a json or xml format.
My concerns through this method is that overtime, the content of the history table would grow exponentially. To overcome this issue, I was thinking that i could make a new database to manage this history every years and then show history data by years to the users.
I am looking for advice about ways to improve this solution and ease the implementation. Any advice or documentation to help me will be welcome
I'd add a threshold and remove or dump to an external file all entries older than a certain period of time.
Related
So im making a web based game similar to Torn City, where there could potentially be millions of users.
My issue is regarding user inventories. I started out creating dynamic tables based on each user's id. e.g Table name = [UserID]_Inventory.
From what Ive found out this can create a load of hacker friendly entries with sql injections and such because of the dynamic creation.
My only other option seems to be creating 1 giant table holding every item that every player has and all the varied details of each item. This seems like it would take longer and longer to load once user count increases and the user's inventory will likely be accessed often.
Is there another option?
My only idea so far is to create some kind of temporary inventory that grabs only the active player inventories. That helps the database search time issues but still brings me back to creating dynamic tables.
At this stage I don't really need coding help, rather I need database structure help.
Code is appreciated tho.
Cheers.
Use the big table. Index it optimally. It should not give you trouble until you get well past a billion rows.
Here's a trick to optimizing use of such a table. Instead of
PRIMARY KEY(id),
INDEX(user_id)
have
PRIMARY KEY(user_id, id),
INDEX(id)
Since the PK is "clustered" with the data and the data is ordered according to the PK, this makes all of one user's data rows sitting next to each other. In huge tables, this cuts back significantly in I/O, hence improves overall speed. Also, it cuts back on pressure on the buffer_pool. (I assume you are using InnoDB?)
The INDEX(id) is sufficient for AUTO_INCREMENT.
There could be more suggestions, but I need more details. Please provide SHOW CREATE TABLE (as it stands now) and the main SELECTs. I am likely to suggest more changes to the indexes, datatypes, and query formulations.
(Dynamic tables is a mistake, and your troubles in that direction have only begun.)
I'm trying to create a Like/Unlike system akin to Facebook's for an existing comments section of a website, and I need help in designing the system.
Currently, every product on the website has a comments section and members can post and like comments. I need to know each member has posted how many comments and each of his comments has received how many likes. Of course, I need to know who liked what comments too (partly so that I can prevent a user from liking a comment more than once) for analytical purposes.
The naive way of implementing a Like system to the current comments module is to create a new table in the database that has foreign keys to the CommentID and UserID. Then for every "like" given to a comment by a user, I would insert a row to this new table with the targeting comment ID and user ID.
While this might work, the massive amount of comments and users is going to cause this table to grow quickly and retrieving records from and doing counts on this huge table will become slow and inefficient. I can index either one of the columns, but I don't know how effective it would be. The website has over a million comments.
I'm using PHP and MySQL. For a system like this with a huge database, how should I designing a Like system so that it is more optimised and stable?
For scalability, do not include the count column in the same table with other things. This is a rare case where "vertical partitioning" is beneficial. Why? The LIKEs/UNLIKEs will come fast and furious. If the code to do the increment/decrement hits a table used for other things (such as the text of the Comment), there will be an unacceptable amount of contention between the two.
This tip is the first of many steps toward being able to scale to Facebook levels. The other tips will come, not from a free forum, but from the team of smart engineers you will have to hire to get to that level. (Hints: Sharding, Buffering, Showing Estimates, etc.)
Your main concern will be a lot of counts, so the easy thing to do is to keep a separate count in your comments table.
Then you can create a TRIGGER that increments/decrements the count based on a like/unlike.
That way you only use the big table to figure out if a user already voted.
Here goes a very basic question.
Here is my stroll in trying to create a form for work:
I created the HTML form. Added some JavaScript to make it do some things I needed. Stylized it with CSS, wrote PHP code and created a Database (I had no idea of how to do it at first) for the entered data to be saved.
I didn't know how to do any of that, but in the past two weeks I've managed to make it exactly the way I needed, and I'm very pleased with myself. After a lot of work, the form perfectly sends the data to the database and displays it in the page after you hit submit, and also looks really good too.
The thing is... this that I am creating is an Activities Bank for us to use here at work (I teach English) and the page (base) I have created is only ONE of MANY that are needed in this data bank. Let me explain... Let's say the page I've created is the post and display of, say, Book3 Chapter1 Activities, and I need to have many other pages (which will be exact copies of this one).
My question is... will I have to create (actually, it's copy and paste) new databases/tables manually (which will be more than one hundred of them) or there is a way to automatize this process?
I mean, all the pages will share the same variables and the same form... the only different thing will be the title and the entered data, of course.
Will I have to create a database for each page? Or a new table for each page in the same database?
If you still don't understand what I need, here is how this is supposed to be:
Book1 has 40 chapters, so, 40 copies of the same form (which already works fine);
PLUS
Book2 that has 40 more chapters, etc.
Thanks in advance for any clarification.
Sorry if this is such a basic question, but if it isn't, if otherwise, what I wanna do is very complicated, I don't mind that I don't know much about all this and I will take the challenge, like I have, when I was making this form from scratch, without ever hearing about "databases". Any words of help are appreciated.
That isn't how databases or tables work. You should be creating a new row in one or more tables for each form submission. You should almost never be dynamically creating tables, and even less often databases.
It sounds like you want a books table, and a chapters table. Each row in the books table will have many rows in the chapters table "pointing" to it via foreign keys.
I think in your case your case two tables.in total.
As you have already added one, you need another one, where the first one will contain the common data with a primary key column and the other table will keep the primary key of the first table and the data which occurs in multiple. Then later you needs to join (Sql join)rotatable to get your data.
A friend of mine has a catalogue that currently holds about 500 rows or 500 items. We are looking at ways that we can provide reports on the catalogue inclduing the number of times an item was viewed, and dates for when its viewed.
His site is averaging around 25,000 page impressions per month and if we assumed for a minute that half of these were catalogue items then we'd assume roughly 12,000 catalogue items viewed each month.
My question is the best way to manage item views in the database.
First option is to insert the catalogue ID into a table and then increment the number of times its viewed. The advantage of this is its compact nature. There will only ever be as many rows in the table as there are catalogue items.
`catalogue_id`, `views`
The disadvantage is that no date information is being held, short of maintaining the last time an item was viewed.
The second option is to insert a new row each time an item is viewed.
`catalogue_id`, `timestamp`
If we continue with the assumed figure of 12,000 item views that means adding 12,000 rows to the table each month, or 144,000 rows each year. The advantage of this is we know the number of times the item is viewed, and also the dates for when its viewed.
The disadvantage is the size of the table. Is a table with 144,000 rows becoming too large for MySQL?
Interested to hear any thoughts or suggestions on how to achieve this.
Thanks.
As you have mentioned the first is a lot more compact but limited. However if you look at option 2 in more detail; for example if you wish to store more than just view count, for instance entry/ exit page, host ip ect. This information maybe invaluable for stats and tracking. The other question is are these 25,000 impressions unique? If not you are able to track by username, ip or some other unique identifier, this could enable you to not use as many rows. The answer to your question relies on how much detail you wish to store? and what is the importance of the data?
Update:
True, limiting the repeats on a given item due to a time interval would be a good solution. Also knowing if someone visited the same item could be useful for suggested items perdition widgets similar to what amazon does. Also knowing that someone visited an item many times says to me that this is a good item to promote to them or others in a mail-out, newsletter or popular product page. Tracking unique views will give a more honest view count, which you can choose to display or store. On the issue of limiting the value of repeat visitors, this mainly only comes into play depending on what information you display. It is all about framing the information in the way that best suits you.
Your problem statement: We want to be able to track number of views for a particular catalogue item.
Lets review you options.
First Option:
In this option you will be storing the catalogue_id and a integer value of the number of views of the items.
Advantages:
Well since you really have a one to one relationship the new table is going to be small. If you have 500 items you will have 500 hundred rows. I would suggest if you choose this route not to create a new table but add another column to the catalogue table with the number of views on it.
Disadvantages:
The problem here is that since you are going to be updating this table relatively frequently it is going to be a very busy little table. For example 10 users are viewing the same item. These 10 updates will have to run one after the other. Assuming you are using InnoDB the first view action would come in lock the row update the counter release the lock. The other updates would queue behind it. So while the data is small on the table it could potentially become a bottleneck later on especially if you start scaling the system.
You are loosing granular data i.e. you are not keeping track of the raw data. For example lets say the website starts growing and you have a interested investor they want to see a breakdown of the views per week over the last 6 months. If you use this option you wont have the data to provide to the investor. Essentially you are keeping a summary.
Second Option:
In this option you would create a logging table with at least the following minimal fields catalogue_id and timestamp. You could expand this to add a username/ip address or some other information to make it even more granular.
Advantages:
You are keeping granular data. This will allow you to summarise the data in a variety of ways. You could for example add a ip address column store the visitors IP and then do a monthly report showing you products viewed by country(you could do a IP address lookup to get a idea of which country they were from). Another example would be to see over the last quarter which products was viewed the most etc. This data is pretty essential in helping you make decisions on how to grow you business. If you want to know what is working what is not working as far as products are concerned this detail is absolutely critical.
Your new table will be a logging table. It will only be insert operations. Inserts can pretty much happen in parallel. If you go with this option it will probably scale better as the site grows compared to a constantly updated table.
Disadvantages:
This table will be bigger probably the biggest table in the database. However this is not a problem. I regularly deal with 500 000 000 rows+ tables. Some of my tables are over 750GB by themselves and I can still run reporting on it. You just need to understand your queries and how to optimise them. This is really not a problem as MySQL was designed to handle millions of rows with ease. Just keep in mind you could archive some information into other tables. Say you archive the data every 3 years you could move data older than 3 years into another table. You dont have to keep all the data there. Your estimate of 144 000 rows means you could probably safely keep about 15+ years worth without every worrying about the performance of the table.
My suggestion to you is to serious consider the second option. If you decide to go this route update your question with the proposed table structures and let us have a look at it. Don't be scared of big data rather be scared of BAD design it is much more difficult to deal with.
However as always the choice is yours.
I'm working on making a consumer CRM system for my boot-strapped startup, where we'll use MySQL. We're moving from an old paper and pen method of tracking leads and referrals, to a digital method for our dealers.
The database will have a standard fields, like lead name, spouse, jobs, referral type, referrer, and lead dealer. This is easy, almost child's play.
Now is the part I'm having a hard time figuring it out. I want to track all the attempted contact dates and responses, and appointments that have been set or reset. The system is going to be web-based, with the front-end in PHP.
I thought about doing nested tables, but I don't want to use Oracle or PostgreSQL, as I like the familiar setup of MySQL.
For the sake of feasibility, say I have 4,000 leads, and each lead is going to be called on average 30 times. So I'll have 120,000 data points to track.
Would it be advisable to:
Make a two dimensional PHP array in the field, to keep track of these metrics.
Have a contact table with all 120k in it, that the application pulls when these metrics are needed
Have a contact table for each lead, which keeps track of all needed metrics
I would make one table for contacts. Add a column to record whether the contact was successful or not.
I would also use MySQL's table partitioning by lead, if many of the queries will be to report on specific leads.
But I second the comment from #Bryan Agee that you should consider carefully before implementing a CRM system from scratch on your weekends.
Start with the table of just the leads. Ideally, it should be filterable and searchable and sortable. Look into the jquery datatables plugin. You can have a table that's paged and pulls its data using AJAX from the server. That way you only need to query and return a few records at a time.
Then create a second table that pops up when the user clicks on the contact. This one is also AJAX and displays the contact history for that particular contact.
This way you never have to query and return the full list, especially if you have 4000, which would be a pain not only for the server but for the people using the system.
Have a contact table for each lead, and add data to it every time action(Contact) is made. It will also give you count and other metrics and it will be easy to implement and track.