MySQL Database Structure for Time Based Chart and Report Generation

MySQL Database Structure for Time Based Chart and Report Generation - php

My application will allow users to like or dislike a product and leave a short feedback. I have to make a functionality which will show graph and produce report based on different time frame, probably it will be yearly, monthly, weekly and daily basis.
I have to show how many users liked or disliked the product on a particular time duration via a chart and generate the report. So my application should able to produce the daily graph of August 2018 or monthly graph of year 2018 of a particular product. The graph should able to reveal how many users liked or disliked the product on daily basis if it is daily graph, Similarly it may be for weekly, monthly or yearly time frame
I am not sure what should be the database structure for this type of application? Here what I have thought so far.
products: id, name, descp...etc // products table
users: id, name, email ...etc // users table
user_reactions: id, user_id(foreign key), product_id(foreign key), action(liked or disliked, tinyint), feedback // user_reactions table
data: id, product_id(foreign key), date(Y-m-d), total_like, total_dislike. // data table, will be used to make graph and report
What, I am thinking is that, I will run a cron job on 23:59:59 every day to count the like and dislike of each product and will add the data in last table, i.e. data table as mentioned above and then will use this data table to make graph and report. I am not sure if this database structure is correct or it have some unseen problem (may be in future?)
Note: My Application will be in PHP and MySQL

Well, There is no right answer to your question. Because an answer to your question is called an opinion based answer. You and I will get enough downvotes for sure . But still, hear me out my friend because I was in your state once.
There is a quote by famous professor Mr. Donald Knuth
Premature optimization is the root of all evil
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.
The idea is that you have to start building. as your application progress, you will face troubles, you will face problems with your database, your system might not scale or it can't handle a million requests. But until you hit that problem you don't have to worry about it.
I am not saying that you should go and build a system blindly with an infinite loop, or create a table join which can cause deadlocks. I hope you get my point.
Build a system with your knowledge and understanding. Because there is no one straight way to a problem. build a feature -> you hit an issue -> tweak your app -> and rinse and repeat. One day your own experiences will show you the right path.
From your given description I can't figure out exactly how it will come out, but I am sure it will suffice for your initial days. As you progress, you might find it hard to add new features or add additional constraints, But its another day. wait for it and ask another question.
I hope I have answered your question.

Related

MySQL managing catalogue views

A friend of mine has a catalogue that currently holds about 500 rows or 500 items. We are looking at ways that we can provide reports on the catalogue inclduing the number of times an item was viewed, and dates for when its viewed.
His site is averaging around 25,000 page impressions per month and if we assumed for a minute that half of these were catalogue items then we'd assume roughly 12,000 catalogue items viewed each month.
My question is the best way to manage item views in the database.
First option is to insert the catalogue ID into a table and then increment the number of times its viewed. The advantage of this is its compact nature. There will only ever be as many rows in the table as there are catalogue items.
`catalogue_id`, `views`
The disadvantage is that no date information is being held, short of maintaining the last time an item was viewed.
The second option is to insert a new row each time an item is viewed.
`catalogue_id`, `timestamp`
If we continue with the assumed figure of 12,000 item views that means adding 12,000 rows to the table each month, or 144,000 rows each year. The advantage of this is we know the number of times the item is viewed, and also the dates for when its viewed.
The disadvantage is the size of the table. Is a table with 144,000 rows becoming too large for MySQL?
Interested to hear any thoughts or suggestions on how to achieve this.
Thanks.

As you have mentioned the first is a lot more compact but limited. However if you look at option 2 in more detail; for example if you wish to store more than just view count, for instance entry/ exit page, host ip ect. This information maybe invaluable for stats and tracking. The other question is are these 25,000 impressions unique? If not you are able to track by username, ip or some other unique identifier, this could enable you to not use as many rows. The answer to your question relies on how much detail you wish to store? and what is the importance of the data?
Update:
True, limiting the repeats on a given item due to a time interval would be a good solution. Also knowing if someone visited the same item could be useful for suggested items perdition widgets similar to what amazon does. Also knowing that someone visited an item many times says to me that this is a good item to promote to them or others in a mail-out, newsletter or popular product page. Tracking unique views will give a more honest view count, which you can choose to display or store. On the issue of limiting the value of repeat visitors, this mainly only comes into play depending on what information you display. It is all about framing the information in the way that best suits you.

Your problem statement: We want to be able to track number of views for a particular catalogue item.
Lets review you options.
First Option:
In this option you will be storing the catalogue_id and a integer value of the number of views of the items.
Advantages:
Well since you really have a one to one relationship the new table is going to be small. If you have 500 items you will have 500 hundred rows. I would suggest if you choose this route not to create a new table but add another column to the catalogue table with the number of views on it.
Disadvantages:
The problem here is that since you are going to be updating this table relatively frequently it is going to be a very busy little table. For example 10 users are viewing the same item. These 10 updates will have to run one after the other. Assuming you are using InnoDB the first view action would come in lock the row update the counter release the lock. The other updates would queue behind it. So while the data is small on the table it could potentially become a bottleneck later on especially if you start scaling the system.
You are loosing granular data i.e. you are not keeping track of the raw data. For example lets say the website starts growing and you have a interested investor they want to see a breakdown of the views per week over the last 6 months. If you use this option you wont have the data to provide to the investor. Essentially you are keeping a summary.
Second Option:
In this option you would create a logging table with at least the following minimal fields catalogue_id and timestamp. You could expand this to add a username/ip address or some other information to make it even more granular.
Advantages:
You are keeping granular data. This will allow you to summarise the data in a variety of ways. You could for example add a ip address column store the visitors IP and then do a monthly report showing you products viewed by country(you could do a IP address lookup to get a idea of which country they were from). Another example would be to see over the last quarter which products was viewed the most etc. This data is pretty essential in helping you make decisions on how to grow you business. If you want to know what is working what is not working as far as products are concerned this detail is absolutely critical.
Your new table will be a logging table. It will only be insert operations. Inserts can pretty much happen in parallel. If you go with this option it will probably scale better as the site grows compared to a constantly updated table.
Disadvantages:
This table will be bigger probably the biggest table in the database. However this is not a problem. I regularly deal with 500 000 000 rows+ tables. Some of my tables are over 750GB by themselves and I can still run reporting on it. You just need to understand your queries and how to optimise them. This is really not a problem as MySQL was designed to handle millions of rows with ease. Just keep in mind you could archive some information into other tables. Say you archive the data every 3 years you could move data older than 3 years into another table. You dont have to keep all the data there. Your estimate of 144 000 rows means you could probably safely keep about 15+ years worth without every worrying about the performance of the table.
My suggestion to you is to serious consider the second option. If you decide to go this route update your question with the proposed table structures and let us have a look at it. Don't be scared of big data rather be scared of BAD design it is much more difficult to deal with.
However as always the choice is yours.

ebay style watch this item

Are there any tutorials out there for adding a "watch this" feature to an online store, so users can see when an item is put on sale for example.
My current setup is in php and mysql and I do not offer this feature but would like to give me customers more control over what they are watching and have the ability to be notified when items are put on sale.
All of the products in the shop are listed in a table with unique prod ids. I was then planning on adding a new table for 'sales' - upon a certain event I would like to automate emails to each user who has added the prod id to their watch-list.
Very similar functionality to ebay's watch item feature -but given the potential scale of the work I want to gauge how easy it will be to implement and maintain before committing to much time/effort to it!
Thanks
JD

No matter what you choose to "watch" the general concept is pretty simple. You have a relational table with the user id and the item id. This table is added to on event (click or form submit, however you choose to do so) Either way it is very easy to implement. If you know how to insert to the database then I need not explain the process.
Then you can either run a search and notify script when a table is modified, or run a cron that checks every minute.
This is so vague it hurts my soul to post it, but ultimately the general concept is simple. log a relation. check every minute to see if the user needs to be notified. end of story.
Now, for a more detailed description or actual code, you may want to narrow down your request. It would help to know if you're using a framework or running straight mysql queries. Can you setup a cron? etc.
I'll update if you provide info sufficient for a clear response.

Implementing a simple review database/application scheme

I'm new to web development and database design, and I'm kind of stumped as how to best accomplish a simple review system for items.
In the current database schema I have a table, call it tbl_item, that has columns for different properties of items. I want users to be able to review items and associate each review in the tbl_reviews to a particular item.
Of course I have a foreign key set up referencing an id column in tbl_item but I do not know where to go from here. Basically my question is: What should calculate the review average?
Should the application make a SQL call every time a review score is requested for a particular item, where the DB would have to then search through all the tbl_reviews rows to find those with a particular item_id?
(That seems wrong.) Should the DB get involved and have some type of calculated field or view or stored procedure that does the same?
Should I have a new column in tbl_item that has the average score in it and is updated whenever any new review corresponding to a particular item is CRUD'ded?
If it matters, I'm using Yii (PHP) and MySQL.

Basically you're asking about efficiency and math.
Here's what I would do:
Your DB is relational. Good, you got that. Each review has a numerical value? Like 1 - 10?
Say it does for this example.
I would say that upon each review, the review itself is set in the DB as well as a queue in an action table. Something that has the item id and a type of action. In this case review.
You then have a cron running in the background every minute or so checking that action queue and in the event of a new review or set of reviews, you run an algorithm for each applicable item that collects all of the data available on the review and returns an educated number based on the standard deviation of the collective data.
This way the math is not run in realtime by the user or when a review is sent. For all we know you have tons of items and tons of reviews, so real time would be bad if your intelligence script is heavy.
As for standard deviation, I check a large variety of things for anti-spam. I store all userdata, IP, datetime, and anything else I can to make sure it's not just one guy logging in with different accounts reviewing his own things with a 10 rating each time. Can't fall for that.
Plus, if you get 100 10 reviews that look legit and 1 review with a score of 1 you can discount it as a hater and just ignore it in the results.
You have to understand your request is enormous, so code snippets are out of the question here.
What I just explained was like 4 months of work for a huge client and a serious anti-spam calculator.
good luck though

Personalized Search Results based on History

What are some of the techniques for providing personalized search results to a logged in user? One way I can think of will be by analyzing the user's browsing history.
Tracking: A log of a user's activities like pages viewed and 'like' buttons clicked can be use to bias search results.
Question 1: How do you track a user's browsing history? A table with columns user_id, number_of_hits, page id? If I have 1000 daily visitors, each browsing 10 pages on average, wont there be a large number of records to select each time a personalized recommendation is required? The table will grow at 300K rows a month! It will take longer and longer to select the rows each time a search is made. I guess the table for recording 'likes' will take the same table design.
Question 2: How do you bias the results of a search? For example, if a user as been searching for apple products, how does the search engine realise that the user likes apple products and subsequently bias the search towards them? Tag the pages and accumulate a record of tags on the page visited?

You probably don't want to use a relational database for this type of thing, take a look at mongodb or cassandra. That's because you basically want to add a new column to the user's history so a column-oriented database makes more sense.

300k rows per month is not really that much, in fact, that's almost nothing. it doesn't matter if you use a relational or non-relational database for this.
Straightforward approach is the following:
put entries into the table/collection like this:
timestamp, user, action, misc information
(make sure that you put as much information as possible, such that you don't need to join this data warehousing table with any other table)
partition by timestamp (one partition per month)
never go against this table directly, instead have say daily report jobs running over all data and collect and compute the necessary statistics and write them to a summary table.
reflect on your report queries and put appropriate partition local indexes
only go against the summary table from your web frontend

If you stored only the last X results as opposed to everything, it would probably be do-able. Might slow things down, but it'd work. Any time you're writing more data and reading more data, there's going to be an impact. Proper DBA methods such as indexing and query optimizing can help, but no matter what you use there's going to be an affect.
I'd personally look at storing just a default view for the user in a DB and use the session to keep track of the rest. Sure, when you login there'd be no history. But you could take advantage of that to highlight a set of special pages that you think are important or relevant to steer the user to. A highlight system of sorts. Faster, easier, and more user-friendly.
As for bias, you could write a set of keywords for each record and array sort them accordingly. Wouldn't be terribly difficult using PHP.

I use MySQL and over 2M records (page views) a month and we run reports on that table daily and often.
The table is partitioned by month (like already suggested) and indexed where needed.
I also clear the table from data that is over 6 months by creating a new table called "page_view_YYMM" (YY=year, MM=month) and using some UNIONS when necessary
for the second question, the way I would approach it is by creating a table with the list of your products that is a simple:
url, description
the description will be a tag stripped of the content of your page or item (depend how you want to influence the search) and then add a full text index on description and a search on that table adding possible extra terms that you have been collecting while the user was surfing your site that you think are relevant (for example category name, or brand)

Database design for a booking application e.g. hotel

I've built one, but I'm convinced it's wrong.
I had a table for customer details, and another table with the each date staying (i.e. a week's holiday would have seven records).
Is there a better way?
I code in PHP with MySQL

Here you go
I found it at this page:
A list of free database models.
WARNING: Currently (November '11), Google is reporting that site as containing malware: http://safebrowsing.clients.google.com/safebrowsing/diagnostic?client=Firefox&hl=en-US&site=http://www.databaseanswers.org/data_models/hotels/hotel_reservations_popkin.htm

I work in the travel industry and have worked on a number of different PMS's. The last one I designed had the row per guest per night approach and it is the best approach I've come across yet.
Quite often in the industry there are particular pieces of information to each night of the stay. For example you need to know the rate for each night of the stay at the time the booking was made. The guest may also move room over the duration of their stay.
Performance wise it's quicker to do an equals lookup than a range in MySQL, so the startdate/enddate approach would be slower. To do a lookup for a range of dates do "where date in (dates)".
Roughly the schema I used is:
Bookings (id, main-guest-id, arrivaltime, departime,...)
BookingGuests (id, guest-id)
BookingGuestNights (date, room, rate)

Some questions you need to ask yourself:
Is there a reason you need a record for each day of the stay?
Could you not just have a table for the stay and have an arrival date and either a number of nights or a departure date?
Is there specific bits of data that differ from day to day relating to one customer's stay?

Some things that may break your model. These may not be a problem, but you should check with your client to see if they may occur.
Less than 1 day stays (short midday stays are common at some business hotels, for example)
Late check-outs/early check-ins. If you are just measuring the nights, and not dates/times, you may find it hard to arrange for these, or see potential clashes. One of our clients wanted a four hour gap, not always 10am-2pm.

Wow, thanks for all the answers.
I had thought long and hard about the schema, and went with a record=night approach after trying the other way and having difficulty in converting to html.
I used CodeIgniter with the built in Calendar Class to display the booking info. Checking if a date was available was easier this way (at least after trying), so I went with it. But I'm convinced that it's not the best way, which is why I asked the question.
And thanks for the DB answers link, too.
Best,
Mei

What's wrong with that? logging each date that the customer is staying allows for what I'd imagine are fairly standard reports such as being able to display the number of booked rooms on any given day.

The answer heavily depends on your requirements... But I would expect only storing a record with the start and stop date for their stay is needed. If you explain your question more, we can give you more details.

A tuple-per-day is a bit overkill, I think. A few columns on a "stay" table should suffice.
stay.check_in_time_scheduled
stay.check_in_time_actual
stay.check_out_time_scheduled
stay.check_out_time_actual

Is creating a record for each day a person stays neccessary? It should only be neccessary if each day is significant, otherwise have a Customer/Guest table to contain the customer details, a Booking table to contain bookings for guests. Booking table would contain room, start date, end date, guest (or guests), etc.
If you need to record other things such as activities paid for, or meals, add those in other tables as required.

One possible way to reduce the number of entries for each stay is, store the time-frame e.g. start-date and end-date. I need to know the operations you run against the data to give a more specific advice.
Generally speaking, if you need to check how many customers are staying on a given date you can do so with a stored procedure.
For some specific operations your design might be good. Even if that's the case I would still hold a "visits" table linking a customer to a unique stay, and a "days-of-visit" table where I would resolve each client's stay to its days.
Asaf.

You're trading off database size with query simplicity (and probably performance)
Your current model gives simple queries, as its pretty easy to query for number of guests, vacancies in room X on night n, and so on, but the database size will increase fairly rapidly.
Moving to a start/stop or start/num nights model will make for some ... interesting queries at times :)
So a lot of the choice is to do with your SQL skill level :)

I don't care for the schema in the diagram. It's rather ugly.
Schema Abstract
Table: Visit
The Visit table contains one row for each night stayed in a hotel.
Note: Visit contains
ixVisit
ixCusomer
dt
sNote
Table: Customer
ixCustomer
sFirstName
sLastName
Table: Stay
The Stay table includes one row that describes the entire visit. It is updated everytime Visit is udpated.
ixStay
dtArrive
dtLeave
sNote
Notes
A web app is two things: SELECT actions and CRUD actions. Most web apps are 99% SELECT, and 1% CRUD. Normalization tends to help CRUD much more than SELECT. You might look at my schema and panic, but it's fast. You will have to do a small amount of extra work for any CRUD activity, but your SELECTS will be so much faster because all of your SELECTS can hit the Stay table.
I like how Jeff Atwood puts it: "Normalize until it hurts, denormalize until it works"
For a website used by a busy hotel manager, how well it works is just as important as how fast it works.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.