Database design for a booking application e.g. hotel - php

I've built one, but I'm convinced it's wrong.
I had a table for customer details, and another table with the each date staying (i.e. a week's holiday would have seven records).
Is there a better way?
I code in PHP with MySQL

Here you go
I found it at this page:
A list of free database models.
WARNING: Currently (November '11), Google is reporting that site as containing malware: http://safebrowsing.clients.google.com/safebrowsing/diagnostic?client=Firefox&hl=en-US&site=http://www.databaseanswers.org/data_models/hotels/hotel_reservations_popkin.htm

I work in the travel industry and have worked on a number of different PMS's. The last one I designed had the row per guest per night approach and it is the best approach I've come across yet.
Quite often in the industry there are particular pieces of information to each night of the stay. For example you need to know the rate for each night of the stay at the time the booking was made. The guest may also move room over the duration of their stay.
Performance wise it's quicker to do an equals lookup than a range in MySQL, so the startdate/enddate approach would be slower. To do a lookup for a range of dates do "where date in (dates)".
Roughly the schema I used is:
Bookings (id, main-guest-id, arrivaltime, departime,...)
BookingGuests (id, guest-id)
BookingGuestNights (date, room, rate)

Some questions you need to ask yourself:
Is there a reason you need a record for each day of the stay?
Could you not just have a table for the stay and have an arrival date and either a number of nights or a departure date?
Is there specific bits of data that differ from day to day relating to one customer's stay?

Some things that may break your model. These may not be a problem, but you should check with your client to see if they may occur.
Less than 1 day stays (short midday stays are common at some business hotels, for example)
Late check-outs/early check-ins. If you are just measuring the nights, and not dates/times, you may find it hard to arrange for these, or see potential clashes. One of our clients wanted a four hour gap, not always 10am-2pm.

Wow, thanks for all the answers.
I had thought long and hard about the schema, and went with a record=night approach after trying the other way and having difficulty in converting to html.
I used CodeIgniter with the built in Calendar Class to display the booking info. Checking if a date was available was easier this way (at least after trying), so I went with it. But I'm convinced that it's not the best way, which is why I asked the question.
And thanks for the DB answers link, too.
Best,
Mei

What's wrong with that? logging each date that the customer is staying allows for what I'd imagine are fairly standard reports such as being able to display the number of booked rooms on any given day.

The answer heavily depends on your requirements... But I would expect only storing a record with the start and stop date for their stay is needed. If you explain your question more, we can give you more details.

A tuple-per-day is a bit overkill, I think. A few columns on a "stay" table should suffice.
stay.check_in_time_scheduled
stay.check_in_time_actual
stay.check_out_time_scheduled
stay.check_out_time_actual

Is creating a record for each day a person stays neccessary? It should only be neccessary if each day is significant, otherwise have a Customer/Guest table to contain the customer details, a Booking table to contain bookings for guests. Booking table would contain room, start date, end date, guest (or guests), etc.
If you need to record other things such as activities paid for, or meals, add those in other tables as required.

One possible way to reduce the number of entries for each stay is, store the time-frame e.g. start-date and end-date. I need to know the operations you run against the data to give a more specific advice.
Generally speaking, if you need to check how many customers are staying on a given date you can do so with a stored procedure.
For some specific operations your design might be good. Even if that's the case I would still hold a "visits" table linking a customer to a unique stay, and a "days-of-visit" table where I would resolve each client's stay to its days.
Asaf.

You're trading off database size with query simplicity (and probably performance)
Your current model gives simple queries, as its pretty easy to query for number of guests, vacancies in room X on night n, and so on, but the database size will increase fairly rapidly.
Moving to a start/stop or start/num nights model will make for some ... interesting queries at times :)
So a lot of the choice is to do with your SQL skill level :)

I don't care for the schema in the diagram. It's rather ugly.
Schema Abstract
Table: Visit
The Visit table contains one row for each night stayed in a hotel.
Note: Visit contains
ixVisit
ixCusomer
dt
sNote
Table: Customer
ixCustomer
sFirstName
sLastName
Table: Stay
The Stay table includes one row that describes the entire visit. It is updated everytime Visit is udpated.
ixStay
dtArrive
dtLeave
sNote
Notes
A web app is two things: SELECT actions and CRUD actions. Most web apps are 99% SELECT, and 1% CRUD. Normalization tends to help CRUD much more than SELECT. You might look at my schema and panic, but it's fast. You will have to do a small amount of extra work for any CRUD activity, but your SELECTS will be so much faster because all of your SELECTS can hit the Stay table.
I like how Jeff Atwood puts it: "Normalize until it hurts, denormalize until it works"
For a website used by a busy hotel manager, how well it works is just as important as how fast it works.

Related

MySQL Database Structure for Time Based Chart and Report Generation

My application will allow users to like or dislike a product and leave a short feedback. I have to make a functionality which will show graph and produce report based on different time frame, probably it will be yearly, monthly, weekly and daily basis.
I have to show how many users liked or disliked the product on a particular time duration via a chart and generate the report. So my application should able to produce the daily graph of August 2018 or monthly graph of year 2018 of a particular product. The graph should able to reveal how many users liked or disliked the product on daily basis if it is daily graph, Similarly it may be for weekly, monthly or yearly time frame
I am not sure what should be the database structure for this type of application? Here what I have thought so far.
products: id, name, descp...etc // products table
users: id, name, email ...etc // users table
user_reactions: id, user_id(foreign key), product_id(foreign key), action(liked or disliked, tinyint), feedback // user_reactions table
data: id, product_id(foreign key), date(Y-m-d), total_like, total_dislike. // data table, will be used to make graph and report
What, I am thinking is that, I will run a cron job on 23:59:59 every day to count the like and dislike of each product and will add the data in last table, i.e. data table as mentioned above and then will use this data table to make graph and report. I am not sure if this database structure is correct or it have some unseen problem (may be in future?)
Note: My Application will be in PHP and MySQL
Well, There is no right answer to your question. Because an answer to your question is called an opinion based answer. You and I will get enough downvotes for sure . But still, hear me out my friend because I was in your state once.
There is a quote by famous professor Mr. Donald Knuth
Premature optimization is the root of all evil
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.
The idea is that you have to start building. as your application progress, you will face troubles, you will face problems with your database, your system might not scale or it can't handle a million requests. But until you hit that problem you don't have to worry about it.
I am not saying that you should go and build a system blindly with an infinite loop, or create a table join which can cause deadlocks. I hope you get my point.
Build a system with your knowledge and understanding. Because there is no one straight way to a problem. build a feature -> you hit an issue -> tweak your app -> and rinse and repeat. One day your own experiences will show you the right path.
From your given description I can't figure out exactly how it will come out, but I am sure it will suffice for your initial days. As you progress, you might find it hard to add new features or add additional constraints, But its another day. wait for it and ask another question.
I hope I have answered your question.

MySQL managing catalogue views

A friend of mine has a catalogue that currently holds about 500 rows or 500 items. We are looking at ways that we can provide reports on the catalogue inclduing the number of times an item was viewed, and dates for when its viewed.
His site is averaging around 25,000 page impressions per month and if we assumed for a minute that half of these were catalogue items then we'd assume roughly 12,000 catalogue items viewed each month.
My question is the best way to manage item views in the database.
First option is to insert the catalogue ID into a table and then increment the number of times its viewed. The advantage of this is its compact nature. There will only ever be as many rows in the table as there are catalogue items.
`catalogue_id`, `views`
The disadvantage is that no date information is being held, short of maintaining the last time an item was viewed.
The second option is to insert a new row each time an item is viewed.
`catalogue_id`, `timestamp`
If we continue with the assumed figure of 12,000 item views that means adding 12,000 rows to the table each month, or 144,000 rows each year. The advantage of this is we know the number of times the item is viewed, and also the dates for when its viewed.
The disadvantage is the size of the table. Is a table with 144,000 rows becoming too large for MySQL?
Interested to hear any thoughts or suggestions on how to achieve this.
Thanks.
As you have mentioned the first is a lot more compact but limited. However if you look at option 2 in more detail; for example if you wish to store more than just view count, for instance entry/ exit page, host ip ect. This information maybe invaluable for stats and tracking. The other question is are these 25,000 impressions unique? If not you are able to track by username, ip or some other unique identifier, this could enable you to not use as many rows. The answer to your question relies on how much detail you wish to store? and what is the importance of the data?
Update:
True, limiting the repeats on a given item due to a time interval would be a good solution. Also knowing if someone visited the same item could be useful for suggested items perdition widgets similar to what amazon does. Also knowing that someone visited an item many times says to me that this is a good item to promote to them or others in a mail-out, newsletter or popular product page. Tracking unique views will give a more honest view count, which you can choose to display or store. On the issue of limiting the value of repeat visitors, this mainly only comes into play depending on what information you display. It is all about framing the information in the way that best suits you.
Your problem statement: We want to be able to track number of views for a particular catalogue item.
Lets review you options.
First Option:
In this option you will be storing the catalogue_id and a integer value of the number of views of the items.
Advantages:
Well since you really have a one to one relationship the new table is going to be small. If you have 500 items you will have 500 hundred rows. I would suggest if you choose this route not to create a new table but add another column to the catalogue table with the number of views on it.
Disadvantages:
The problem here is that since you are going to be updating this table relatively frequently it is going to be a very busy little table. For example 10 users are viewing the same item. These 10 updates will have to run one after the other. Assuming you are using InnoDB the first view action would come in lock the row update the counter release the lock. The other updates would queue behind it. So while the data is small on the table it could potentially become a bottleneck later on especially if you start scaling the system.
You are loosing granular data i.e. you are not keeping track of the raw data. For example lets say the website starts growing and you have a interested investor they want to see a breakdown of the views per week over the last 6 months. If you use this option you wont have the data to provide to the investor. Essentially you are keeping a summary.
Second Option:
In this option you would create a logging table with at least the following minimal fields catalogue_id and timestamp. You could expand this to add a username/ip address or some other information to make it even more granular.
Advantages:
You are keeping granular data. This will allow you to summarise the data in a variety of ways. You could for example add a ip address column store the visitors IP and then do a monthly report showing you products viewed by country(you could do a IP address lookup to get a idea of which country they were from). Another example would be to see over the last quarter which products was viewed the most etc. This data is pretty essential in helping you make decisions on how to grow you business. If you want to know what is working what is not working as far as products are concerned this detail is absolutely critical.
Your new table will be a logging table. It will only be insert operations. Inserts can pretty much happen in parallel. If you go with this option it will probably scale better as the site grows compared to a constantly updated table.
Disadvantages:
This table will be bigger probably the biggest table in the database. However this is not a problem. I regularly deal with 500 000 000 rows+ tables. Some of my tables are over 750GB by themselves and I can still run reporting on it. You just need to understand your queries and how to optimise them. This is really not a problem as MySQL was designed to handle millions of rows with ease. Just keep in mind you could archive some information into other tables. Say you archive the data every 3 years you could move data older than 3 years into another table. You dont have to keep all the data there. Your estimate of 144 000 rows means you could probably safely keep about 15+ years worth without every worrying about the performance of the table.
My suggestion to you is to serious consider the second option. If you decide to go this route update your question with the proposed table structures and let us have a look at it. Don't be scared of big data rather be scared of BAD design it is much more difficult to deal with.
However as always the choice is yours.

How can I create statistics from e-commerce data without having huge tables?

Say I was building an e-commerce website, how would I go about recording the number of products sold each day to show later on, I know I could save the amount of items in stock and put it into a database then see whether it has decreased the next day which I suppose is inevitably the solution, but imagine a store owner who wanted to see how much of any product he had sold in the last year, bearing in mind he has 1000 products, would this require 1000 columns with 365 rows? Am I thinking about this wrong or is this really the case? I know there are extensions you can download for things such as os-commerce and Magento among others which have this kind of functionality and I was wondering whether they shared a common approach or came up with something else?
So basically I'm looking to generate reports and statistics, how is this usually done, does it require huge tables with every daily change for every product?
My first inclination would be to treat each transaction as a row, with the name of the product sold in one of the columns. To meet your other requirements, a date column would be necessary to query if the item was sold between two dates.
You can then do any number of manipulations to this table.

Personalized Search Results based on History

What are some of the techniques for providing personalized search results to a logged in user? One way I can think of will be by analyzing the user's browsing history.
Tracking: A log of a user's activities like pages viewed and 'like' buttons clicked can be use to bias search results.
Question 1: How do you track a user's browsing history? A table with columns user_id, number_of_hits, page id? If I have 1000 daily visitors, each browsing 10 pages on average, wont there be a large number of records to select each time a personalized recommendation is required? The table will grow at 300K rows a month! It will take longer and longer to select the rows each time a search is made. I guess the table for recording 'likes' will take the same table design.
Question 2: How do you bias the results of a search? For example, if a user as been searching for apple products, how does the search engine realise that the user likes apple products and subsequently bias the search towards them? Tag the pages and accumulate a record of tags on the page visited?
You probably don't want to use a relational database for this type of thing, take a look at mongodb or cassandra. That's because you basically want to add a new column to the user's history so a column-oriented database makes more sense.
300k rows per month is not really that much, in fact, that's almost nothing. it doesn't matter if you use a relational or non-relational database for this.
Straightforward approach is the following:
put entries into the table/collection like this:
timestamp, user, action, misc information
(make sure that you put as much information as possible, such that you don't need to join this data warehousing table with any other table)
partition by timestamp (one partition per month)
never go against this table directly, instead have say daily report jobs running over all data and collect and compute the necessary statistics and write them to a summary table.
reflect on your report queries and put appropriate partition local indexes
only go against the summary table from your web frontend
If you stored only the last X results as opposed to everything, it would probably be do-able. Might slow things down, but it'd work. Any time you're writing more data and reading more data, there's going to be an impact. Proper DBA methods such as indexing and query optimizing can help, but no matter what you use there's going to be an affect.
I'd personally look at storing just a default view for the user in a DB and use the session to keep track of the rest. Sure, when you login there'd be no history. But you could take advantage of that to highlight a set of special pages that you think are important or relevant to steer the user to. A highlight system of sorts. Faster, easier, and more user-friendly.
As for bias, you could write a set of keywords for each record and array sort them accordingly. Wouldn't be terribly difficult using PHP.
I use MySQL and over 2M records (page views) a month and we run reports on that table daily and often.
The table is partitioned by month (like already suggested) and indexed where needed.
I also clear the table from data that is over 6 months by creating a new table called "page_view_YYMM" (YY=year, MM=month) and using some UNIONS when necessary
for the second question, the way I would approach it is by creating a table with the list of your products that is a simple:
url, description
the description will be a tag stripped of the content of your page or item (depend how you want to influence the search) and then add a full text index on description and a search on that table adding possible extra terms that you have been collecting while the user was surfing your site that you think are relevant (for example category name, or brand)

Splitting 20 million record database

I have a US company database which is 20 millions records. Firstly they is no budget for a massive RAM database server. So I think I am going to have to split the db into parts, 4 parts grouped by State.
My question is how is the best way to handle this with PHP, I am thinking get the users query find the State and then point to the relevant db? Any thoughts?
I think you need to look at the MySQL partitioning
sounds like you might want to consider sharding.
Not sure if you are using an ORM for data access, but some of them support sharding. Some info on sharding for php and mySQL here:
http://highscalability.com/database-sharding-netlog-mysql-and-php
just realised - link missing to the actual article in last url... try here: http://www.jurriaanpersyn.com/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/
You don't need PHP for all these operations. Maybe to generate SQL code. It's better to make SQL scripts that copy data from the original tables into the new. See "INSERT ... SELECT ..." and "CREATE TABLE ... AS SELECT ..." if you are not familiar with them yet.
If you have MySQL >=5.1, then try partitioning table so that any request hits only 1 partition.
If users need information only on 1 state, partition it by state. There can be a lot of partitions without overhead work for you. If users can see only a certain time frame, like month graphs in Webalizer, partition by months, and so on.
Also consider creating aggregate tables. Let me elaborate: in data warehouses there is a distinction between metrics and attributes.
An attribute is a column that tells where, when, what, what kind of.
A metric tells how much, how many.
An aggregate table has less level of detail: either less attributes (no geographical info, or no product info), or some steps upper the attributes in the full table (just state instead of city+state, year-month instead of date, and so on).
And the last: make sure your users really need the detailed old data. Some of the data becomes irrelevant in a couple of years. For instance, website referrers have no meaning after 1,5-2 years, since most of the websites change. The 2-years-old website traffic data can be just a number of daily/monthly graphs.

Categories