I have a Web-App that control selling and buying of merchandise as well as stock and prices...
it is built with AngularJs, PHP and MySQL(PDO).
My Model has many General perpose query functions such as:
getShops()
getShopInfo(shopId)
GetItems(shopId)
GetSuppliers()
and many more...
now, i develop a Dashboard Page, to show statistics and Top-Level vision on things, containing f.e:
Number of active items
Number of Sold items + total sum
Current Debt to suppliers
and many more aggregations on the data.
My question is what option of the two:
Should i use many basic queries in my model, and aggregate the data Client-Side...
will this be bit more maintainable?
OR
Should i create a Specific Query to get exactly what data this dashboard needs.
probably performance will be better.
You should without a doubt create a specific query to get exactly what data your dashboard needs. In fact, you should create a view for it so that the actual select statement is plain simple. It has several advantages:
Aggregation is what databases are good at: indexes are used (if you design it well), and results are cached for all clients to benefit from, resulting in a performance that can hardly be beaten;
The SQL language is quite suitable for formulating aggregations, certainly when you compare it to client-side JavaScript;
It will help you in debugging. Without running the web app, you can run the select directly on your database to verify the result, and so better isolate any problems;
If you create a database view, you can even decide to one day change the internal definition of the view without having to touch your Web App code;
The volume of data transfered between server and client is kept minimal: only the needed data is transferred. This can be important for users connecting over mobile networks.
Related
I'm developing a PHP dashboard with statistics from a lot of different MySql tables. There is need for many cumulatives and other totals for building charts etcetera. Some queries can be very simple, others join and compare many tables at once.
To do this properly I am considering 3 approaches;
1) Creating a master table which is constantly updated after each front-end (website) and back-end (CRM) interaction and contains only totals for different purposes. So the data is easily accessible with a simple select statement for building the main dashboard.
2) Using many and/or complex queries each time the dashboard is generated. This will take less developing time but more loading time. Maybe there is a better way to manage the order and execution of each query.
3) Creating cron jobs for updating the totals in the background. This is my least favourite approach because is feels outdated for multiple reasons.
Could someone advise me and explain what the best approach would be for the long term?
Thanks in advance.
Fred.
1st of all there is no "correct approach", but only one that would fit your needs
In fitting your needs 2 things must be considered:
Your business requirement (real time, daily updated etc)
Scalability of the code maintenance
As per your case, I would combine the options.
1st of all, I would create a dedicated database with dedicated db, which is good for performance, and historical data saving (which might change, if you take your 2nd approach)
Under that condition, the question whether to update the dashboard online or via a cron job: very dependent on the business need. I think a cron job is better
1st of all, in's scalable - you can ditch it in the future, and update only the dashboard...
2nd of all, you can time it to run during "slow hours" preventing overloading your production servers. Different cron jobs can update different tables in a different frequency etc.
This is of course, my opinion. Hop it helps
I have an activity records table named revisions (showed in following image) built for a big learning management system, which mainly keeps record of CRUD operations on tables (e.g. who has done what on which object in what time).
This table may contain up to 3M records of data. I want to build a search functionality for this on the front-end with PHP/Laravel.
Now my question is that what things should I consider for building search functionalities with high performance for tables with millions of records of data, what are the things on code level, database level, or are there 3rd party stuff to support these kind of issues?
I am experienced with building systems with PHP/Laravel, Python/Django, Ruby, etc. But I have never encountered with a case like this, dealing with millions records of data. So please keep in mind my knowledge/experience level. I have NO experience on this level.
Note: Search will be an advance search, making users able to search with different criteria and parameters, the object which is changed, who has changed it, when it's changed, etc.
Let me know if my question still isn't clear.
I would recommend to take a look at the https://www.elastic.co/products/elasticsearch and save your activity records to its storage when you do save to the main database. Then you can easily search any field. Elasticsearch can store a schema free JSON documents, if you prefer more SQL way, there is another search engine - http://sphinxsearch.com/.
There is no problem inserting a zillion rows into a table. Performance problems come when you try to do non-trivial SELECTs on the table. You mentioned "search"; you will have to limit what the 'users' can search for. But at least make a stab at what they might want to search for.
You mentioned "searching for an object", but I don't see a column called object. How many rows might there be for a given object? Do you need all the rows? Or selected ones? (An INDEX on object is likely to make the query efficient, regardless of table size.)
Third-party software sometimes gets in the way of dealing with really large tables. Beware.
The system I'm working is structured as below. Given that I'm planning to use Joomla as the base.
a(www.a.com),b(www.b.com),c(www.c.com) are search portals which allows user to to search for reservation.
x(www.x.com),y(www.y.com),z(www.z.com) are hotels where booking are made by users.
www.a.com's user can only search for the booking which are in
www.x.com
www.b.com's user can only search for the booking which are in
www.x.com,www.y.com
www.c.com's user can search for all the booking which are in
www.x.com, www.y.com, www.z.com
All a,b,c,x,y,z runs the same system. But they should have separate domains. So according to my finding and research architecture should be as above where an API integrate all database calls.
Given that only 6 instance are shown here(a,b,c,x,y,z). There can be up to 100 with different search combinations.
My problems,
Should I maintain a single database for the whole system ? If so how can I unplug one instance if required(EG : removing www.a.com from the system or removing www.z.com from the system) ? Since I'm using mysql will it not be cumbersome for the system due to the number of records?
If I maintain separate database for each instance how can I do the search? How can I integrate required records into one and do the search?
Is there a different database approach to be used rather than mentioned above ?
The problem you describe is "multitenancy" - it's a fairly tricky problem to solve, but luckily, others have written up some useful approaches. (Though the link is to Microsoft, it applies to most SQL environments except in the details).
The trade-offs in your case are:
Does the data from your hotels fit into a single schema? Do their "vacancy" records have the same fields?
How many hotels will there be? 3 separate databases is kinda manageable; 30 is probably not; 300 is definitely not.
How large will the database grow? How many vacancy records?
How likely is it that the data structures will change over time? How likely is it that one hotel will need a change that the others don't?
By far the simplest to manage and develop against is the "single database" model, but only if the data is moderately homogenous in schema, and as long as you can query the data with reasonable performance. I'd not worry about putting a lot of records in MySQL - it scales very well.
In such a design, you'd map "portal" to "hotel" in a lookup table:
PortalHotelAccess
PortalID HotelID
-----------------
A X
B X
B Y
C X
C Y
C Z
I can suggest 2 approaches. Which one to choose depends from some additional information about whole system. In fact, the main question is whether your system can impersonate (substitune by itself, in legal meaning) any of data providers (x, y, z, etc) from consumers point of view (a, b, c, etc) or not.
Centralized DB
First one is actually based on your original scheme with centralized API. It implies a single search engine, collecting required data from data sources, aggregating it in its own DB, and providing to data cosumers.
This is most likely a preferrable solution if data sources are different in their data representation, so you need to preprocess it for uniformity. Also this variant protects your clients from possible problems in connectivity, that is if one of source site goes offline for a short period (I think this may be even up to several hours without a great impact on the booking service actuality), you can still handle requests to the offline site, and store all new documents in the central DB until the problems solved. On the other hand, this means that you should provide some sort of two-way synchronization between your DB and every data source site. Also the centralized DB should be created with reliability in mind in the first place, so it seems that it should be distributed (preferrably over different data centers).
As a result - this approach will probably give best user experience, but will require sufficient efforts for robust implementation.
Multiple Local DBs
If every data provider runs its own DB, but all of them (including backend APIs) are based on a single standard, you can eliminate the need to copy their data into central DB. Of course, the central point should remain, but it will host a middle-layer logic only without DB. The layer is actually an API which binds (x, y, z) with appropriate (a, b, c) - that is a configuration, nothing more. Every consumer site will host a widget (can be just a javascript or fully-fledged web-application) loaded from your central point with appropriate settings embedded into it.
The widget will request all specified backends directly, and aggregate their results in a single list.
This variant is much like most of todays web-applications work, it's simplier to implement, but it is more error-prone.
I am developing a project at work for which I need to create and maintain Summary Tables for performance reasons. I believe the correct term for this is Materialized Views.
I have 2 main reasons to do this:
Denormalization
I normalized the tables as much as possible. So there are situations where I would have to join many tables to pull data. We work with MySQL Cluster, which has pretty poor performance when it comes to JOIN's.
So I need to create Denormalized Tables that can run faster SELECT's.
Summarize Data
For example, I have a Transactions table with a few million records. The transactions come from different websites. The application needs to generate a report will display the daily or monthly transaction counts, and total revenue amounts per website. I don't want the report script to calculate this every time, so I need to generate a Summary Table that will have a breakdown by [site,date].
That is just one simple example. There are many different kinds of summary tables I need to generate and maintain.
In the past I have done these things by writing several cron scripts to keep each summary table updated. But in this new project, I am hoping to implement a more elegant and proper solution.
I would prefer a PHP based solution, as I am not a server administrator, and I feel the most comfortable when I can control everything through my application code.
Solutions that I have considered:
Copying VIEW's
If the resulting table can be represented as a single SELECT query, I can generate a VIEW. Since they are slow, there can be a cronjob that copies this VIEW into a real table.
However, some of these SELECT queries can be so slow that it's not acceptable even for cronjobs. It is not very efficient to recreate the whole summary data, if older rows are not even being updated much.
Custom Cronjobs for each Summary Table
This is the solution I have used before, but now I am trying to avoid it if possible. If there will be many summary tables, it can be messy to maintain.
MySQL Triggers
It is possible to add triggers to the main tables so that every time there is an INSERT, UPDATE or DELETE, the summary tables get updated accordingly.
There would be no cronjobs and the summaries would be in real time. However if there is ever a need to rebuild a summary table from scratch, it would have to be done with another solution (probably #1 above).
Using ORM Hooks/Triggers
I am using Doctrine as my ORM. There is a way to add event listeners that will trigger stuff on INSERT/UPDATE/DELETE, which in turn can update the summary tables. In a sense this solution is similar to #3 above, but I will have better control over these triggers since they will be implemented in PHP.
Implementation Considerations:
Complete Rebuilds
I want to avoid having to rebuild the summary tables, for efficiency, and only update for new data. But in case something goes wrong, I need the capability to rebuild the summary table from scratch using existing data on the main tables.
Ignoring UPDATE/DELETE on Old Data
Some summaries can assume that older records will never be updated or deleted, but only new records will be inserted. The summary process can save a lot of work by making the assumption that it doesn't need to check for updates on older data.
But of course this won't apply to all tables.
Keeping a Log
Let's assume that I won't have access to, or do not want to use the binary MySQL logs.
For summarizing new data, the summary process just needs to remember the last primary key id's for the last records it summarized. Next time it runs, it can summarize everything after that id. However, to keep track of older records that have been updated/deleted, it needs another log so it can go back and re-summarize that data.
I would appreciate any kind of strategies, suggestions or links that can help. Thank you!
As noted above materialized views in Oracle are different than indexed views in SQL Server. They are very cool and useful. See http://download.oracle.com/docs/cd/B10500_01/server.920/a96567/repmview.htm for details
MySql does not have support for these however.
One thing you mention several times is poor performance. Have you checked your database design for proper indexing and run explain plans on the queries to see why they are slow. See here http://dev.mysql.com/doc/refman/5.1/en/using-explain.html. This is of course assuming that your server is tuned properly, you have mysql setup and tuned, e.g. buffer caches, etc. etc. etc.
To your direct question. What you sound like you want to do is something we do often in a data warehouse situation. We have a production database and a DW that pulls in all sorts of information, aggregates and pre-caclulates it to speed up querying. This may be overkill for you but you can decide. Depending on the latency you define for your reports, i.e. how often you need them, we normally go through an ETL (extract transform load) process periodically (daily, weekly, etc.) to populate the DW from the production system. This keeps impact low on the production system and moves all reporting to another set of servers which also lessens the load. On the DW side, I would normally design my schemas different, i.e. using star schemas. (http://www.orafaq.com/node/2286) Star schemas have fact tables (things you want to measure) and dimensions (things you want to aggregate the measures by (time, geography, product categories, etc.) On SQL Server they also include an additional engine called SQL Server Analysis services (SSAS) to look at fact tables and dimensions, pre calculate and build OLAP data cubes. In these data cubes you can drill down and look at all types of patterns, do data analysis and data mining. Oracle does things slightly differently but the outcome is the same.
Whether you want to go the about route really depends on the business need and how much value you get from data analysis. As I said it is likely overkill if you just have a few summary tables but some of the concepts you may find helpful as you think things through. If your business is going toward a business intelligence solution then this is something to consider.
PS You can actually set a DW up to work in "real-time" using something called ROLAP if that is the business need. Microstrategy has a good product that works well for this.
PPS You also may want to look at PowerPivot from MS (http://www.powerpivot.com/learn.aspx) I have only played with it so I cannot tell you how it works on very large datasets.
Flexviews (http://flexvie.ws) is an open source PHP/MySQL based project. Flexviews adds incrementally refreshable materialized views (like the materialized views in Oracle) to MySQL, usng PHP and stored procedures.
It includes FlexCDC, a PHP based change data capture utility which reads binary logs, and the Flexviews MySQL stored procedures which are used to define and maintain the views.
Flexviews supports joins (inner join only) and aggregation so it can be used to create summary tables. Moreover, you can use Flexviews in combination with Mondrian's (a ROLAP server) aggregation designer to create summary tables that the ROLAP tool can automatically use.
If you don't have access to the logs (it can read them remotely, btw, so you don't need server access, but you do need SUPER privs) then you can use 'COMPLETE' refresh with Flexviews. This automates creating a new table with 'CREATE TABLE ... AS SELECT' under a new table name. It then uses RENAME TABLE to swap the new table for the one, renaming the old with an _old postfix. Finally it drops the old table. The advantage here is that the SQL to create the view is stored in the database (flexviews.mview) and can be refreshed with a simple API call which automates the swapping process.
I'm building an analytics system for a mobile application and have had some difficulty deciding how to store and process large amounts of data.
Each row will represent a 'view' (like a web page) and store some fixed attributes, like user agent and date. Additionally, each view may have a varying number of extra attributes, which relate to actions performed or content identifiers.
I've looked at Amazon SimpleDb which handles the varying number of attributes well, but has no support for GROUP BY and doesn't seem to perform well when COUNTing rows either. Generating a monthly graph with 30 data points would require a query for each day per dataset.
MySQL handles the COUNT and GROUP modifiers much better but additional attributes require storage in a link table and a JOIN to retrieve views where attributes match a given value, which isn't very fast. 5.1's partitioning feature may help speed things up a bit.
What I have gathered from a lot of reading and profiling queries on the aforementioned systems is that ultimately all of the data needs to be aggregated and stored in tables for quick report generation.
Have I missed anything obvious in my research and is there a better way to do this than use MySQL? It doesn't feel like the right task for the job, but I can't find anything capable of both GROUP/COUNT queries and a flexible table structure.
This is a case where you want to store the data once and read it over and over. Further I think that you'd wish the queries to be preprocessed instead of needing to be calculated on every go.
My suggestion for you is to store your data in CouchDB for the following reasons:
Its tables are structureless
Its queries are pre-processed
Its support for map-reduce allows your queries to handle group by
It has a REST service access model which lets you connect from pretty much anything that handle HTTP requests
You may find this suggestion a little out there considering how new CouchDB is. However I'd suggest for you to read about it because personally I think running a CouchDB database is sweet and lightweight. More light weight than MySQL
Keeping it in MySQL: If the amount of writes are limited / reads are more common, and the data is relatively simple (i.e: you can predict possible characters), you could try to use a text/blob column in the main table, which is updated with comma separated values or key/value pairs with an AFTER INSERT / UPDATE trigger on the join table. You keep the actual data in a separate table, so searching for MAX's / specific 'extra' attributes can still be done relatively fast, but retrieving the complete dataset for one of your 'views' would be a single row in the main table, which you can split into the separate values with the script / application you're using, relieving much of the stress on the database itself.
The downside of this is a tremendous increase in cost of updates / inserts in the join table: every alteration of data would require a query on all related data for a record, and a second insert into the 'normal' table, something like
UPDATE join_table
JOIN main_table
ON main_table.id = join_table.main_id
SET main_table.cache = GROUP_CONCAT(CONCAT(join_table.key,'=',join_table.value) SEPARATOR ';')
WHERE join_table.main_id = 'foo' GROUP BY main_table.id`).
However, as analytics data goes it usually trails somewhat, so possibly not every update has to trigger an update in cache, just a daily cronscript filling the cache with yesterdays data could do.