Big amounts of data in Amazon RDS - php

Currently I'm trying to store a big amount of e-mails (100M+) in mysql in Amazon RDS. I've made a seperate emails_bodies table but it's getting way to big.
With around 40k e-mails the table size just got over 1GB, using Amazon RDS. The original (e-mail) files are saved on the Amazon S3 and the bodies (text-only) are just in the DB for searching. With higher user-numbers (which easily counts over 100M emails) I would use TB's of mysql storage.
CREATE TABLE `emails` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`accounts_id` int(10) unsigned NOT NULL,
`ehash` varchar(32) NOT NULL,
`subject` text NOT NULL,
`body` longtext NOT NULL,
`html` tinyint(1) unsigned NOT NULL,
`size` int(10) unsigned NOT NULL,
`datetime` datetime NOT NULL,
`created` datetime NOT NULL,
`last_updated` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `ehash` (`ehash`),
KEY `accounts_id` (`accounts_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
CREATE TABLE `bodies` (
`bodies_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`bodies_emails_id` int(10) unsigned NOT NULL,
`bodies_body` longtext NOT NULL,
PRIMARY KEY (`bodies_id`),
UNIQUE KEY `bodies_emails_id` (`bodies_emails_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;

According to my calculations, each body consumes 25K in average. That's pretty fair amount for the email body. Though you can reduce that amount if extract only text part out of multipart body, if your only intention is search. I am sure that average size will be reduced to mere 1k or less.

Related

Creation of a "temporary" table for daily operations?

I have a mysql table MAINLIST.
CREATE TABLE `MAINLIST` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`NAME` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`email` tinyint(1) unsigned DEFAULT NULL,
`contact` tinyint(1) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=17 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Every day I select a subset of these and perform some operations. Right now I do this within the MAINLIST table, but I think it would be helpful for organization, readability and debugging to create a second table daily import the selected records, do the operations and then send the records back to the Mainlist table and destroy the daily table.
What is the best way to do this with mysql, or are there other ways to approach this problem? Perhaps I should not be doing this at all. I am wondering what best practices are since I'm not experienced with Db design. I am using the redbean ORM and php.

Jstree table definition

I have integrated Jstree in my application, now i want to understand different column in that table:
CREATE TABLE IF NOT EXISTS `tree` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`parent_id` bigint(20) unsigned NOT NULL,
`position` bigint(20) unsigned NOT NULL,
`left` bigint(20) unsigned NOT NULL,
`right` bigint(20) unsigned NOT NULL,
`level` bigint(20) unsigned NOT NULL,
`title` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`type` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=13 ;
This is the default table provided by the site.
Now if want to add a node, how do i know the value for left, right and level.
This looks like a mix of Adjacency list an nested sets.
Nested sets are a better way of storing trees in a relational database.
It's hard to explain the principle you have to look here and here.
When you use nested sets you don't need parent_id.
I think jstree provided a sample table where you can choose by yourself what technique you use.
Another way of storing trees in a database would be a Closure Table.
It's my personal favourite. It's simple but powerful. But you hardly find anything about it on the net.

Execute php script at particular time (time will be fetched from database)

I have database structure like this.
CREATE TABLE IF NOT EXISTS `addreminde` (
`SMSId` int(11) NOT NULL AUTO_INCREMENT,
`UserId` int(11) NOT NULL,
`SendFrom` varchar(20) NOT NULL,
`SendTo` varchar(400) NOT NULL,
`Message` varchar(400) NOT NULL,
`ReminderTime` datetime NOT NULL,
`Status` varchar(400) DEFAULT NULL,
`datetime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`SMSId`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=15;
So i am storing the ReminderTime in database.
Now want to know how i can send a email or (let say execute a php script ) to "SendTo" at the "ReminderTime"
Any help will be appreciated.
This cannot be done in PHP. PHP only comes to 'live' when a user makes a request to the webserver.
As mentioned, a cron job is the way to go. And a query ones a minute and sending some email will not be a big load for your server.

Game resources storing

I have the following table townResources in which I store every resource value for every town ID. I am a bit reserved about performance impact for a large amount of users. I am thinking for moving the balance for resources to the towns table, and the general value of an resource to store it in a .php file.
Here you have the townresources table:
CREATE TABLE IF NOT EXISTS `townresources` (
`townResourcesId` int(10) NOT NULL AUTO_INCREMENT,
`userId` int(10) NOT NULL,
`resourceId` int(10) NOT NULL,
`townId` int(10) NOT NULL,
`balance` decimal(8,2) NOT NULL,
`resourceRate` decimal(6,2) NOT NULL,
`lastUpdate` datetime NOT NULL,
PRIMARY KEY (`resourceId`,`townId`,`townResourcesId`,`userId`),
KEY `townResources_userId_users_userId` (`userId`),
KEY `townResources_townId_towns_townId` (`townId`),
KEY `townResourcesId` (`townResourcesId`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Stores Town Resources' AUTO_INCREMENT=9 ;
What is the best option in my case?
Your best option is to test first. How much users & towns do you want to support? Triple that.. create the test data and see whether the performance is within bounds.
If you run into trouble with performance you should look into caching the data with redis or memcache.

PHP model (MySQL) design problem

I'm looking for the most efficient solution to the problem I'm running into. I'm designing a shift calendar for our employees. This is the table I'm working with so far:
CREATE TABLE IF NOT EXISTS `Shift` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`accountId` smallint(6) unsigned NOT NULL,
`grpId` smallint(6) unsigned NOT NULL,
`locationId` smallint(6) unsigned NOT NULL,
`unitId` smallint(6) unsigned NOT NULL,
`shiftTypeId` smallint(6) unsigned NOT NULL,
`startDate` date NOT NULL,
`endDate` date NOT NULL,
`needFlt` bit(1) NOT NULL DEFAULT b'1',
`needBillet` bit(1) NOT NULL DEFAULT b'1',
`fltArr` varchar(10) NOT NULL,
`fltDep` varchar(10) NOT NULL,
`fltArrMade` bit(1) NOT NULL DEFAULT b'0',
`fltDepMade` bit(1) NOT NULL DEFAULT b'0',
`billetArrMade` bit(1) NOT NULL DEFAULT b'0',
`billetDepMade` bit(1) NOT NULL DEFAULT b'0',
`FacilityId` smallint(6) unsigned NOT NULL,
`FacilityWingId` mediumint(9) unsigned NOT NULL,
`FacilityRoomId` int(11) unsigned NOT NULL,
`comment` varchar(255) NOT NULL,
`creation` datetime NOT NULL,
`lastUpdate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`lastUpdateBy` mediumint(9) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
Now here's the hitch - I'd like to be able to display on the calendar (in a different color) whether or not a timesheet has been received for a certain day.
My first thought was to create a separate table and list separate entries by day for each employee, T/F. But the amount of data returned from a separate query, for each employee, for the whole month would surely be huge and inefficient.
Second thought was to somehow put the information in this Shift table, with delimiters - then exploding it with PHP. Silly idea... but I guess that's why im here. Any thoughts?
Thanks for your help!
As hinted previously and I think you realized yourself, serializing the data into a single column or using some other form of delimited string is a path to computational inefficiencies in the packing and unpacking and serious maintenance grief for the future.
Heaps better is to get the data structure right, i.e. a properly normalized table. After all, MySQL is rather good at dealing with this some of structure.
You don't need to pull back every line for every staff member. If you're pull them out together, you could "group" your resultset by employee and date, and even make that a potentially useful result by (say) pulling the summary of hours. A zero result or null result would show no timesheet, and the total hours may be helpful in some other way.
If you were pulling them out an employee and a date at a time then your application structure probably needs looking at, but you could use the SQL LIMIT keyword to pull at most one record and then test to see if any came back.

Categories