Database structure for a system with multisite - Database & PHP - php

The system I'm working is structured as below. Given that I'm planning to use Joomla as the base.
a(www.a.com),b(www.b.com),c(www.c.com) are search portals which allows user to to search for reservation.
x(www.x.com),y(www.y.com),z(www.z.com) are hotels where booking are made by users.
www.a.com's user can only search for the booking which are in
www.x.com
www.b.com's user can only search for the booking which are in
www.x.com,www.y.com
www.c.com's user can search for all the booking which are in
www.x.com, www.y.com, www.z.com
All a,b,c,x,y,z runs the same system. But they should have separate domains. So according to my finding and research architecture should be as above where an API integrate all database calls.
Given that only 6 instance are shown here(a,b,c,x,y,z). There can be up to 100 with different search combinations.
My problems,
Should I maintain a single database for the whole system ? If so how can I unplug one instance if required(EG : removing www.a.com from the system or removing www.z.com from the system) ? Since I'm using mysql will it not be cumbersome for the system due to the number of records?
If I maintain separate database for each instance how can I do the search? How can I integrate required records into one and do the search?
Is there a different database approach to be used rather than mentioned above ?

The problem you describe is "multitenancy" - it's a fairly tricky problem to solve, but luckily, others have written up some useful approaches. (Though the link is to Microsoft, it applies to most SQL environments except in the details).
The trade-offs in your case are:
Does the data from your hotels fit into a single schema? Do their "vacancy" records have the same fields?
How many hotels will there be? 3 separate databases is kinda manageable; 30 is probably not; 300 is definitely not.
How large will the database grow? How many vacancy records?
How likely is it that the data structures will change over time? How likely is it that one hotel will need a change that the others don't?
By far the simplest to manage and develop against is the "single database" model, but only if the data is moderately homogenous in schema, and as long as you can query the data with reasonable performance. I'd not worry about putting a lot of records in MySQL - it scales very well.
In such a design, you'd map "portal" to "hotel" in a lookup table:
PortalHotelAccess
PortalID HotelID
-----------------
A X
B X
B Y
C X
C Y
C Z

I can suggest 2 approaches. Which one to choose depends from some additional information about whole system. In fact, the main question is whether your system can impersonate (substitune by itself, in legal meaning) any of data providers (x, y, z, etc) from consumers point of view (a, b, c, etc) or not.
Centralized DB
First one is actually based on your original scheme with centralized API. It implies a single search engine, collecting required data from data sources, aggregating it in its own DB, and providing to data cosumers.
This is most likely a preferrable solution if data sources are different in their data representation, so you need to preprocess it for uniformity. Also this variant protects your clients from possible problems in connectivity, that is if one of source site goes offline for a short period (I think this may be even up to several hours without a great impact on the booking service actuality), you can still handle requests to the offline site, and store all new documents in the central DB until the problems solved. On the other hand, this means that you should provide some sort of two-way synchronization between your DB and every data source site. Also the centralized DB should be created with reliability in mind in the first place, so it seems that it should be distributed (preferrably over different data centers).
As a result - this approach will probably give best user experience, but will require sufficient efforts for robust implementation.
Multiple Local DBs
If every data provider runs its own DB, but all of them (including backend APIs) are based on a single standard, you can eliminate the need to copy their data into central DB. Of course, the central point should remain, but it will host a middle-layer logic only without DB. The layer is actually an API which binds (x, y, z) with appropriate (a, b, c) - that is a configuration, nothing more. Every consumer site will host a widget (can be just a javascript or fully-fledged web-application) loaded from your central point with appropriate settings embedded into it.
The widget will request all specified backends directly, and aggregate their results in a single list.
This variant is much like most of todays web-applications work, it's simplier to implement, but it is more error-prone.

Related

Web project: Multiple instances vs single instance

Team,
We are building a web project (like IT ticketing system). And we expect to have some big clients as soon we release the product. There should be three ways to raise a ticket: 1) via web application (forms), 2) via email or 3) via phone call to agent. According to our research 99% of tickets come via email and that means we shall be storing a lot of long messages etc.
The project is scoped so that we have two interfaces: agents (IT folks handling queries) and clients (people who ask for help).
The question here is what would you suggest us to do considering expected data and storage growth:
centralize everything so that we have one app with a
single huge database (easy to backup etc. unless we stuck with ex.
data corruption or similar)...
separate app in two parts one for IT agents and another one for
clients. The idea is to split application in two: one centralized
interface and back-end for IT agents and another one for clients.
For each client we would create a separate database along with a
copy of the PHP project (code syncing is easy to automate). Multiple
client instances could be hosted on one or many servers. They would
communicate via APIs. For example: IT agent opens a dashboard and
the list of outstanding tickets is displayed. If that agent is
working on 10 big clients back-end would need to contact 10
instances via API and request outstanding tickets. We can ensure
only certain number of queries would be displayed...
Please feel free to add third option as well.
I am not quite sure that I understood everything correctly but from what I understood I can point out the following key points about
your system requirement:
You are dealing with lot of data, and the data will grow fast
Most of traffic is coming from Email ticketing system
You have a multi-client system
You have an agent which can view data from multiple clients.
Question is can this agent manipulate(create, update, delete) data from multiple clients?
This is quite important point for future limitations of the architecture or not. I will assume that it can only read data from multiple clients.
Your 2 suggestions:
I would not recommend that as approach as many other problems could arise as the database grows.
For example you will be forced to add Indexes to speed up queries on your db which will help in the
beginning but later this will come to hunt you down especially if you have to add a lot of Non-Clustered
indexes. You could make it a little better by using Read-only replicas but even with this you will at some point
have issues. The problem will still remain in your 1 main database which will grow.
Quote:
separate app in two parts one for IT agents and another one for
clients. The idea is to split application in two: one centralized
interface and back-end for IT agents and another one for clients. For
each client we would create a separate database along with a copy of
the PHP project (code syncing is easy to automate). Multiple client
instances could be hosted on one or many servers. They would
communicate via APIs. For example: IT agent opens a dashboard and the
list of outstanding tickets is displayed. If that agent is working on
10 big clients back-end would need to contact 10 instances via API and
request outstanding tickets. We can ensure only certain number of
queries would be displayed...
You can split it to 2 separate apps as you said:
Centralized Interface + back-end, would call the 1 or multiple databases
Client application + back-end(monolith or multiple services), would call the same database as Centralized interface
but only for current client
As far as I understood your problem is not scaling Web-Servers(your back-end) but the db? If your problem is scaling the back-end as well
then you can consider either scaling to multiple instances or splitting your domain to micro-services and scaling that architecture on
micro-service level for each service independently.
My Suggestion:
1. Scaling your back-end:
You can keep everything in one service(monolithic approach) and deploy it
on multiple servers and scale the whole service together. There is nothing wrong with this.
Like everything it depends of your Business/Domain requirements and what worked best for you.
Although it is very popular these days to use micro-services they are not the best solution for
every problem. I have worked with both types of architecture and they have worked fine for
different scenarios.
You can even have middle-ground solution between them to take on specific part which has
high scaling demand and extract that to be a separate service(like creating Tickets sub-system service)
and the rest of the application which has low demand would be one big service.
2. Scaling your database:
Considering the above points I would suggest you to use Data Sharding or Data Partitioning.
You can read about data sharding here.
In general it is a way to logically and physically split your data from one database to multiple based
on some partitioning or shard key.
This means that you can take one specific concept in your Domain as the Shard key to split the data based on it.
In your case this could be CustomerId. This can only be done if the Business operations which include more
then one Customer is not the case for your Business. Means if the all your operations are done within one Customer.
The only exception here would be reading/viewing more customers together. This is fine as this does not need any
transnational behavior.
This really depends on your Business-scenarios and logic.
If splitting your database to multiple databases based on shard-key CustomerId is not enough you can take a shard-key
which is even more specific inside the Customer scope. Again it depends if your Domain allows this.
In this case it could be for example the concept a CustomerA would have CustomerA-Europe shard
CustomerA-USA shard, CustomerA-Africa and so on.
This would represent the logical shard. The physical shard would be the physical database.
The important point is that you pick your logical shard-key in the beginning so that you can easily
migrate your data to different physical databases later when you need it based on that shard-key.
Additionally to this you could include Historization for some heavy tables to separate the up to date
data from your historical data. You can read more about this here.

Multiple databases, or always limit with query

I have a question regarding databases and performances, so let me explain the situation.
The application - to be build - has the following set-up:
A group, with under that group, users.
Data / file-locations, (which is used to search through), estimated that one group can easily reach one million "search" terms.
Now, groups can never look at each other's data, and users can only look at the data which belongs to their group.
The only thing they should have in common is, some place to send error logs to (maybe, not even necessary).
Now in this situation, would you create a new database per group, or always limit your search results with a query, which will take someones user-group-id into account?
Now my idea was to just create a new Database, because you do not need to limit your query, every single time and it will keep the results to search through lower (?) but is that really necessary or is, even on over a million records, a "where groupid = 1" fast enough to not notice a decrease in performance.
This is the regular multi-tenant SaaS Architecture problem, which has been discussed at length, and the solution always varies according to your own situation. Here is one example of this discussion that I will just link to instead of copy-paste since all of it is worth a read: Multi-tenant PHP SaaS - Separate DB's for each client, or group them?
In addition to that I would like to add some more high level considerations:
Are there any legal requirements regarding the storage of your user's data? Some businesses operate in a regulatory environment where they are not allowed to store their data in a shared environment, quite common in the financial and medical industries.
Will you offer the same security (login method, data storage encryption), backup/restore service, geolocation redundancy and up-time guarantee to all users?
Are there any users who are willing to pay extra to have their data stored in a separate environment?
Are there any users who will potentially have requirements that are not compatible with the standard product that you will be offering? If so will you try to accommodate them? Note that occasionally there is some big customer that comes along and offers a lot of cash for a special treatment.
What is a separate environment? Is it a separate database, a separate virtual machine, a separate physical machine, a machine managed by the customer?
What parts of your application is part of each environment (hardware configuration, network config, database, source code, binaries, encryption certificates, etc)?
Will there be some heavy users that may produce loads on your application that will negatively impact the performance for the smaller users?
If you go for all users in one environment then is there a possibility that you in the future will create a separate environment for some customer? If so this will impact where you put shared data, eg configuration data like tax rates, and exchange rate data, etc.
I hope this helps.
Performance isn't really your problem, maintaining and data security is. If you have a lot of databases, you will have more to maintain. Not only backups but connection strings, patches, schema updates on release and so on. Multiple databases also suggests that you will have multiple PHP sites too. That will gradually get more expensive as the number of groups grows.
If you have one database then you need to ensure that every query contains the group id before it can run.
Database tables can be very, very large if you choose your indexes and constraints carefully. If you are performing joins against very large tables then it will be slow but a simple lookup, where you have an index on the group column should be fast enough.
If you were to share a single database, would you ever move a group out of it? If that's a possibility then split the databases now. If you are going to have one PHP site then I would recommend a single database with a group column.

use multiple general perpose queries or one specific query

I have a Web-App that control selling and buying of merchandise as well as stock and prices...
it is built with AngularJs, PHP and MySQL(PDO).
My Model has many General perpose query functions such as:
getShops()
getShopInfo(shopId)
GetItems(shopId)
GetSuppliers()
and many more...
now, i develop a Dashboard Page, to show statistics and Top-Level vision on things, containing f.e:
Number of active items
Number of Sold items + total sum
Current Debt to suppliers
and many more aggregations on the data.
My question is what option of the two:
Should i use many basic queries in my model, and aggregate the data Client-Side...
will this be bit more maintainable?
OR
Should i create a Specific Query to get exactly what data this dashboard needs.
probably performance will be better.
You should without a doubt create a specific query to get exactly what data your dashboard needs. In fact, you should create a view for it so that the actual select statement is plain simple. It has several advantages:
Aggregation is what databases are good at: indexes are used (if you design it well), and results are cached for all clients to benefit from, resulting in a performance that can hardly be beaten;
The SQL language is quite suitable for formulating aggregations, certainly when you compare it to client-side JavaScript;
It will help you in debugging. Without running the web app, you can run the select directly on your database to verify the result, and so better isolate any problems;
If you create a database view, you can even decide to one day change the internal definition of the view without having to touch your Web App code;
The volume of data transfered between server and client is kept minimal: only the needed data is transferred. This can be important for users connecting over mobile networks.

Using Sqlite in a web application

I'm currently developping an application which allows doctors to dinamically generate invoices. The fact is, each doctors requires 6 differents database tables, and there could be like 50 doctors connected at the same time and working with the database (writing and reading) at the same time.
What I wanted to know is if the construction of my application fits. For each doctors, I create a personnal Sqlite3 database (all database are secure) which only him can connect to. I'll have like 200 Sqlite database, but is there any problems ? I thought it could be better than using a big MySQL database for everyone.
Is this solution viable ? Will I have problems to deal with ? I never did such an application with so many users, but I thought it could be the best solution
Firstly, to answer your question: no, you probably will not have any significant problems if a single sqlite database is used only by one person (user) at a time. If you highly value certain edge cases, like the ability to move some users/databases to another server, this might be a very good solution.
But it is not a terribly good design. The usual way is to have all data in the same database, and tables having a field which identifies which rows belong to which users. The application code is responsible for maintaining security (i.e. not to let users see data which doesn't belong to them), and indexes in the database (which you should use in all cases, even in your own design) are responsible for making it fast.
There are a large number of tutorials which could help you to make a better database design; a random google result is http://www.profsr.com/sql/sqless02.htm .

Approaches to gathering large visiting statistics

I have website, where users can post their articles and I would like to give full stats about each articles visits and referrers to it's author. Realization seems quite straight forward here, just store a database record for every visit and then use aggregate functions to draw graphs and so on.
The problem is, that articles receive about 300k views in 24 hours and just in a month, stats table will get about 9 million records which is a very big number, because my server isn't quite powerful.
Is there a solution to this kind of task? Is there an algorithm or caching mechanism that allows to store long term statistics without losing accuracy?
P.S. Here is my original stats table:
visitid INT
articleid INT
ip INT
datetime DATETIME
Assuming a home-brewed usage-tracking solution (as opposed to say GA as suggested in other response), a two databases setup may be what you are looking for:
a "realtime" database which captures the vist events as they come.
an "offline" database where the data from the "realtime" database is collected on a regular basis, for being [optionally] aggregated and indexed.
The purpose for this setup is mostly driven by operational concerns. The "realtime" database is not indexed (or minimally indexed), for fast insertion, and it is regularly emptied, typically each night, when the traffic is lighter, as the "offline" database picks up the events collected through the day.
Both databases can have the very same schema, or the "offline" database may introduce various forms of aggregation. The specific aggregation details applied to the offline database can vary greatly depending on the desire to keep the database's size in check and depending on the data which is deemed important (most statistics/aggregation functions introduce some information loss, and one needs to decide which losses are acceptable and which are not).
Because of the "half life" nature of the value of usage logs, whereby the relative value of details decays with time, a common strategy is to aggregate info in multiple tiers, whereby the data collected in the last, say, X days remains mostly untouched, the data collected between X and Y days is partially aggregated, and finally, data older than Y days only keep the most salient info (say, number of hits).
Unless you're particularly keen on storing your statistical data yourself, you might consider using Google Analytics or one of its modern counterparts, which are much better than the old remotely hosted hit counters of the 90s. You can find the API to the Google Analytics PHP interface at http://code.google.com/p/gapi-google-analytics-php-interface/

Categories