Save database storage or save queries accesing to multiple tables? - php

I am considering the structural database´s design of the following specific problem:
I have 2 different tables belonging to the same database. In the first table the detailed data of different objects is stored, where the column id refers to the the specific object.
On the other hand, the second table stores every single change that the objects in the first table have perceived. Every single row in our second table stores as well the id referencing to the object as the version_id which defines the different state versions of the objects, that is every single change effectuated.
Now let´s say the 'eliminated' parameter is set to "true" in a row of objects table for declaring an object as not visible in the object´s manager site. In our display site the table version is accesed for showing a linked object´s version, nevertheless the system shouldn´t display it if the object refered by id is marked as eliminated.
For solving this problem, I have two possible solutions: either increment the database storage, adding an eliminated column to theversion table, or I add a query in php for processing the parameter eliminated from the objects table after receiving the object id from the version table.
I want to know which disadvantage and advantage are presented in both different solutions, if saving storage cost would be prefarable than processing more queries and accesing multiple queries for receiving the data, or if contrary sacrificing storage cost and spreading the eliminated column into the version table leads to a better response time performance of the site by sparing multiple queries for accesing data from other tables.
CREATE TABLE `objects` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`eliminated` tinyint(1) DEFAULT NULL,
...
PRIMARY KEY (`id`),
KEY `id` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
CREATE TABLE `version` (
`version_id` int(11) NOT NULL AUTO_INCREMENT,
`object_id` int(11) NOT NULL,
`eliminated` tinyint(1) DEFAULT NULL, //optional
...
PRIMARY KEY (`version_id`),
KEY `id` (`version_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8

The advantage of adding an eliminated column to the version table is that it provides you with details of the object elimination. It allows you to store details of the elimination for the object.
The drawback is that you are saving an extra row and also adding an extra column, which can create an overhead if there are a lot of rows in the table.
Which solution you use depends on how much data is being stored in your tables and also what data needs to be displayed to the user

Related

Same MySql Query Long execution time but short on archive table with 6million more records

I am a bit stumped on this wierdness.
I have a gps tracking app that logs gps points into a track_log table.
When I do a basic query on the running log table it takes about 50 seconds to complete:
SELECT * FROM track_log WHERE node_id = '26' ORDER BY time_stamp DESC LIMIT 1
When I run the exact same query on the archived table where I copied most of the logs to to reduce the running table's logs to about 1.2 million records.
The archive table is 7.5 million records big.
The exact same query on the archive table runs for 0.1 seconds on the same server even though it's six times bigger!
What's going on?
Here's the full Create Table schema:
CREATE TABLE `track_log` (
`id_track_log` INT(11) NOT NULL AUTO_INCREMENT,
`node_id` INT(11) DEFAULT NULL,
`client_id` INT(11) DEFAULT NULL,
`time_stamp` DATETIME NOT NULL,
`latitude` DOUBLE DEFAULT NULL,
`longitude` DOUBLE DEFAULT NULL,
`altitude` DOUBLE DEFAULT NULL,
`direction` DOUBLE DEFAULT NULL,
`speed` DOUBLE DEFAULT NULL,
`event_code` INT(11) DEFAULT NULL,
`event_description` VARCHAR(255) DEFAULT NULL,
`street_address` VARCHAR(255) DEFAULT NULL,
`mileage` INT(11) DEFAULT NULL,
`run_time` INT(11) DEFAULT NULL,
`satellites` INT(11) DEFAULT NULL,
`gsm_signal_status` DOUBLE DEFAULT NULL,
`hor_pos_accuracy` double DEFAULT NULL,
`positioning_status` char(1) DEFAULT NULL,
`io_port_status` char(16) DEFAULT NULL,
`AD1` decimal(10,2) DEFAULT NULL,
`AD2` decimal(10,2) DEFAULT NULL,
`AD3` decimal(10,2) DEFAULT NULL,
`battery_voltage` decimal(10,2) DEFAULT NULL,
`ext_power_voltage` decimal(10,2) DEFAULT NULL,
`rfid` char(8) DEFAULT NULL,
`pic_name` varchar(255) DEFAULT NULL,
`temp_sensor_no` char(2) DEFAULT NULL,
PRIMARY KEY (`id_track_log`),
UNIQUE KEY `id_track_log_UNIQUE` (`id_track_log`),
KEY `client_id_fk_idx` (`client_id`),
KEY `track_log_node_id_fk_idx` (`node_id`),
KEY `track_log_event_code_fk_idx` (`event_code`),
KEY `track_log_time_stamp_index` (`time_stamp`),
CONSTRAINT `track_log_client_id` FOREIGN KEY (`client_id`) REFERENCES `clients` (`client_id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `track_log_event_code_fk` FOREIGN KEY (`event_code`) REFERENCES `event_codes` (`event_code`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `track_log_node_id_fk` FOREIGN KEY (`node_id`) REFERENCES `nodes` (`id_nodes`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=8632967 DEFAULT CHARSET=utf8
TL;DR
Make sure the indexes are defined in both tables, for this query node_id and time_stamp are good indexes.
Defragment your table: https://dev.mysql.com/doc/refman/5.5/en/innodb-file-defragmenting.html (This could help, but should not make this much of a difference).
Make sure your query is not being blocked by other queries. If data is being inserted in the track_log table at continuously, those queries might block your query. You can prevent this by changing the transaction isolation level, see https://dev.mysql.com/doc/refman/5.5/en/set-transaction.html for more information. Caution: be carefull with this!
Indexes
I'm guessing this has something to do with the indexes you defined on the tables. Could you post the SHOW CREATE TABLES track_log output and the output of your archive table as well? The query you are executing would require an index on node_id and time_stamp for optimal performance.
Defragmentation
Besides this indexes you defined on the table, this might have something to do with data fragmentation. I'm assuming you are using InnoDB as your table engine now. Depending on your settings, every table in a database is stored in a separate file or every table in the database is stored in a single file (innodb_file_per_table variable). Those files will never shrink in size. If your track_log table has grown to 8.7 million records, on disk, it still takes up space for all those 8.7 million records.
If you have moved records from your track_log table to your archive table, the data might still be at the beginning and the end of the physical file for track_log. If no index is defined at time_stamp, a full table scan is still required to order by the timestamp. This means: reading the complete file from disk. Because the records you deleted still take up space in the file, this could make a difference.
Edit:
Transactions
Other transactions might be blocking your SELECT query. This can happen with the InnoDB engine. If you continously insert a lot of data into your track_log table, those queries might block your query. It will have to wait until no other transactions are being performed at this table.
There is a way around this, but you should be careful with this. You are able to change to transaction isolation level of your query. By setting the transaction isolation level to READ UNCOMMITTED you will be able to read data, while the other inserts are running. But it might not always give you the latest data. If you want to sacrifice this depends on your situation. If you are going to alter the data and update the data later, you generally do not want to change the transaction isolation level. But, for example, when showing statistics which should not always be accurate and up to date, this could be something that really speeds up your query.
I use this myself sometimes when I need to show statistics from large tables which are updated regularly.
This is almost certainly because your archive table has superior indexing to your track_log table.
To satisfy this query efficiently you need a compound index on (node_id, time_stamp) Why does this work? Because InnoDB and MyISAM indexes are so-called BTREE indexes, which means our intuition about searching them in order will work. Your query looks for a specific value of node_id, which means it can jump to that value in the index efficiently. The query then calls for the highest possible value of time_stamp related to that node_id value. Now that's in the same index, and in the right order to access it quickly too. So the row you need can be random-accessed, and MySQL doesn't have to hunt for it by scanning the table row by row. That scanning is almost certainly what's taking the time in your query.
Three things to keep in mind:
One: lots of indexes on single columns can't help a query as much as well-chosen compound indexes. Read this http://use-the-index-luke.com/
Two: SELECT * is usually harmful on a table with as many columns as the one you have shown. Instead, you should enumerate the columns you actually need in your SELECT query. That way MySQL doesn't have to sling as much data.
Three: The DOUBLE datatype is overkill for commercial-grade GPS data. FLOAT is plenty of precision.
Let us analyze your query:
SELECT * FROM track_log WHERE node_id = '26' ORDER BY time_stamp DESC LIMIT 1
The above mentioned query first sorts all the data present in the table based on time_stamp and then returns the top row.
But, when this query is executed on archived table, order by clause might be ignored (based on compression and system setting) and hence it returns the first row it encountered in the table.
You may verify the output of archived table by comparing the result with actual latest row.

Blending Relational with None-Relational (MySQL and MongoDB)

This has been talked about before but I have yet to come across a clear answer, its just roughly described as fire and water and left at that (from my research).
Relational and None Relational databases are very different, but they both pull data, for my project I plan to use a None Relational database, however this will be installed in many places and some only have access to MySQL (then later moved).
So is it possible to force MySQL into a kind of None Relational mode? I have used a schema that sort of mimics it but it still holds aspects of a relational database that so far I have not been able to overcome (overly dependent on ID's and such, leading to syntax/data structure being messy).
So is there a magic library that will do this?
Here is a rough outline of my database schema:
1 table is "meta" it contains and id, along with type and date and such, commonly searched fields that are universal basically.
1 table that contains "data" this has multiple rows for each "column". It cannot be done through a join so its 2 queries to get the data.
CREATE TABLE `meta` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`type` varchar(255) NOT NULL,
`state` tinyint(3) NOT NULL DEFAULT '0',
`created` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8;
CREATE TABLE `data` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`meta_id` int(11) unsigned NOT NULL DEFAULT '0',
`index` varchar(255) NOT NULL,
`value` longtext NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8;
As you can see not easily searchable unless its by id/date or something, it also requires PHP to take up a lot of the slack for ordering and such. Not what I am really worried about though but to do an actual search it would need to dump the entire database and chew through it.....
What kind of MySQL schema (or concept) could best replicate a None Relational model (and still handle search reasonably)?
First, there is no such thing as magic.
You have reinvented the Entity-Attribute-Value design. This is a non-relational design. I've written about this before, but in brief: you end up having to implement in application code many features that you take for granted in an RDBMS, like constraints and data types.
This is related to the concept of the Inner-Platform Effect:
The Inner-Platform Effect is a result of designing a system to be so customizable that it ends becoming a poor replica of the platform it was designed with. This "customization" of this dynamic inner-platform becomes so complicated that only a programmer (and not the end user) is able to modify it.
If that's the type of work that you would like to spend your time doing, then go for it.
My preference is to use MySQL for relational data, and use a non-relational data store for non-relational data. One can access both databases from the same application.
its just roughly described as fire and water and left at that (from my research).
I think of it more like fire and marshmallows. If you know what you're doing, you can make one of the best treats in the world. Or you could end up holding a stick covered in a charred, sticky mess.

Alternatives for a many-to-many link table

I am working on a project where I want to allow the end user to basically add an unlimited amount of resources when creating a hardware device listing.
In this scenario, they can store both the quantity and types of hard-drives. The hard-drive types are already stored in a MySQL Database table with all of the potential options, so they have the options to set quantity, choose the drive type (from dropdown box), and add more entries as needed.
As I don't want to create a DB with "drive1amount", "drive1typeid", "drive2amount", "drive2typeid", and so on, what would be the best way to do this?
I've seen similar questions answered with a many-to-many link table, but can't think of how I could pull this off with that.
Something like this?
CREATE TABLE `hardware` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(256) NOT NULL,
`quantity` int(11) NOT NULL,
`hardware_type_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `type_id` (`hardware_type_id`),
CONSTRAINT `hardware_ibfk_1` FOREIGN KEY (`hardware_type_id`) REFERENCES `hardware_type` (`id`)
) ENGINE=InnoDB
hardware_type_id is a foreign key to your existing table
This way the table doesnt care what kind of hardware it is
Your answer relies a bit on your long term goals with this project. If you want to posses a data repository which has profiles all different types of hardware devices with their specifications i suggest you maintain a hardware table of each different types of hardware. For example you will have a harddisk table which consist of all different models and types of hardisks out there. Then you can assign a record from this specific table to the host configuration table. You can build the dataset as you go from the input from user.
If this is not clear to you let me know i will create a diagram and upload for you.

What's a good DB schema to store high volume logging data?

I'm adding "activity log" to a busy website, which should show user the last N actions relevant to him and allow going to a dedicated page to view all the actions, search them etc.
The DB used is MySQL and I'm wondering how the log should be stored - I've started with a single Myisam table used for FULLTEXT searches, and to avoid extra select queries on every action: 1) an insert to that table happens 2) the APC cache for each is updated, so on the next page request mysql is not used. Cache has a log lifetime and if it's missing, the first AJAX request from user creates it.
I'm caching 3 last events for each user, so when a new event happens, I grab the current cache, add the new event to the beginning and remove the oldest event, so there's always 3 of those in the cache. Every page of the site has a small box displaying those.
Is this a proper setup? How would you recommend implementing this sort of feature?
The schema I have is:
CREATE DATABASE `audit`;
CREATE TABLE `event` (
`eventid` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`userid` INT UNSIGNED NOT NULL ,
`createdat` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ,
`message` VARCHAR( 255 ) NOT NULL ,
`comment` TEXT NOT NULL
) ENGINE = MYISAM CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER DATABASE `audit` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE `audit`.`event` ADD FULLTEXT `search` (
`message` ( 255 ) ,
`comment` ( 255 )
);
Based on your schema, I'm guessing that (caching aside), you'll be inserting many records per second, and running fairly infrequent queries along the lines of select * from event where user_id = ? order by created_date desc, probably with a paging strategy (thus requiring "limit x" at the end of the query to show the user their history.
You probably also want to find all users affected by a particular type of event - though more likely in an off-line process (e.g. a nightly mail to all users who have updated their password"; that might require a query along the lines of select user_id from event where message like 'password_updated'.
Are there likely to be many cases where you want to search the body text of the comment?
You should definitely read the MySQL Manual on tuning for inserts; if you don't need to search on freetext "comment", I'd leave the index off; I'd also consider a regular index on the "message" table.
It might also make sense to introduce the concept of "message_type" so you can introduce relational consistency (rather than relying on your code to correctly spell "password_updat3"). For instance, you might have an "event_type" table, with a foreign key relationship to your event table.
As for caching - I'm guessing users would only visit their history page infrequently. Populating the cache when they visit the site, on the off-chance they might visit their history (if I've understood your design) immediately limits the scalability of your solution to how many history records you can fit into your cachce; as the history table will grow very quickly for your users, this could quickly become a significant factor.
For data like this, which moves quickly and is rarely visited, caching may not be the right solution.
This is how Prestashop does it:
CREATE TABLE IF NOT EXISTS `ps_log` (
`id_log` int(10) unsigned NOT NULL AUTO_INCREMENT,
`severity` tinyint(1) NOT NULL,
`error_code` int(11) DEFAULT NULL,
`message` text NOT NULL,
`object_type` varchar(32) DEFAULT NULL,
`object_id` int(10) unsigned DEFAULT NULL,
`id_employee` int(10) unsigned DEFAULT NULL,
`date_add` datetime NOT NULL,
`date_upd` datetime NOT NULL,
PRIMARY KEY (`id_log`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=6 ;
My advice would be use a schema less storage system .. they perform better in high volume logging data
Try to consider
Redis
MongoDB
Riak
Or any other No SQL System

Single mysql table for private messaging

I'm trying to create a single table for private messaging on a website. I created the following table which I think is efficient but I would really appreciate some feedback.
CREATE TABLE IF NOT EXISTS `pm` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`to` int(11) NOT NULL,
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`subject` varchar(255) DEFAULT NULL,
`message` text NOT NULL,
`read` tinyint(1) NOT NULL DEFAULT '0',
`deleted` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
FOREIGN KEY (user_id) REFERENCES User(user_id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
I have 2 columns that determine the status of the message: read and deleted
If read = 1, the message has been read by the receiver. If deleted = 1, either the sender or the receiver deleted the message from the sent or received inbox. If deleted = 2 both users deleted the message, therefor delete the row from the database table.
I see that you don't have have any indexes explicitly stated. Having the appropriate indexes on your table could improve your performance significantly. I also believe that for your message column you may want to consider making i a varchar with a max size explicitly stated. Other than those two items which you may already taken care of your table looks pretty good to me.
MySQL Table Performance Guidelines:
Add appropriate indexes to tables. Indexes aren't just for primary/unique keys add them to frequently referenced columns.
Explicitly state maximum lengths. Fixed length tables are faster than their counterpart
Always have an id column.
Add NOT NULL where ever you can. The nulls still take up space
Know your data types. Knowledge is power and can save on performance and space
Interesting Articles:
VarChar/TEXT Benchmarks
Similar Question
Some Best Practices
Data Type Storage Requirements
The articles and some of the items I have listed may not be 100% correct or reliable so make sure you do a bit of your own research if you are interested in further tuning your performance.
A few comments:
Charset=latin1 is going to piss some people of I'd suggest charset=utf8.
I'd suggest putting a foreign key check in not only on user_id, but on to as well.
Also I'd put an index on date, as you will be doing a lot of sorting on that field.
You need to split deleted in two fields, otherwise you will not know which user has deleted the message. (deleted_by_user, deleted_by_recipient)
Note that date is a reserved word and you'll need to change it into message_date or `backtick` it in your queries.
some comments:
not bad.
i would name the table something that other people might guess out of context. so maybe private_message instead of pm.
i would be explicit on the user column names, so maybe from_user_id, and to_user_id instead of 'user_id' and 'to'
i would consider pulling out the status into a new table with status, user_id, and date - this should give you a lot more flexibility in who is doing what to the message over time.
For displaying both the receiver's inbox and the senders outbox (and being able to delete messages respectively), you will probably need more information that what you currently have encoded. I would suggest a "deleted" field for each party. (As long as this is limited to only 1 user on each end and no broadcast messages, this works. This does not scale to broadcast messages, however, which would require more than 1 table to do efficiently)
You may also want to enforce key relationships with ON DELETE and ON UPDATE:
FOREIGN KEY (user_id) REFERENCES User(user_id) ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (to) REFERENCES User(user_id) ON DELETE CASCADE ON UPDATE CASCADE
The removal or modification of a user will propagate changes or deletions to the messages table.
I think you may need to add an column called Parent_Message_ID which will have the parent mail ID. So that replies can also included.
If you think in future to add replies to your private messages.

Categories