I've a big table with about 20 millions of rows and every day it grows up and I've a form which get a query from this table. Unfortunately query returns hundreds of thousands of rows.
Query is based on Time, and I need all records to classify them by 'clid' base on some rules.So I need all records to do some process on them to make a result table.
This is my table :
CREATE TABLE IF NOT EXISTS `cdr` (
`gid` bigint(20) NOT NULL AUTO_INCREMENT,
`prefix` varchar(20) NOT NULL DEFAULT '',
`id` bigint(20) NOT NULL,
`start` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`clid` varchar(80) NOT NULL DEFAULT '',
`duration` int(11) NOT NULL DEFAULT '0',
`service` varchar(20) NOT NULL DEFAULT '',
PRIMARY KEY (`gid`),
UNIQUE KEY `id` (`id`,`prefix`),
KEY `start` (`start`),
KEY `clid` (`clid`),
KEY `service` (`service`)
) ENGINE=InnoDB DEFAULT CHARSET=utf-8 ;
and this is my query :
SELECT * FROM `cdr`
WHERE
service = 'test' AND
`start` >= '2014-02-09 00:00:00' AND
`start` < '2014-02-10 00:00:00' AND
`duration` >= 10
Date period could be various from 1 hour to maybe 60 day or even more.(like :
DATE(start) BETWEEN '2013-02-02 00:00:00' AND '2014-02-03 00:00:00'
)
The result set has about 150,000 rows for every day. When i try to get result for bigger period or even one day database crashes.
Does anybody have any idea ?
I don't know how to prevent it from crashing, but one thing that I did with my large tables was partition them by date.
Here, I partition the rows by date, twice a month. As long as your query uses the partitioned column, it will only search the partitions containing the key. It will not do a full table scan.
CREATE TABLE `identity` (
`Reference` int(9) unsigned NOT NULL AUTO_INCREMENT,
...
`Reg_Date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`Reference`),
KEY `Reg_Date` (`Reg_Date`)
) ENGINE=InnoDB AUTO_INCREMENT=28424336 DEFAULT CHARSET=latin1
PARTITION BY RANGE COLUMNS (Reg_Date) (
PARTITION p20140201 VALUES LESS THAN ('2014-02-01'),
PARTITION p20140214 VALUES LESS THAN ('2014-02-14'),
PARTITION p20140301 VALUES LESS THAN ('2014-03-01'),
PARTITION p20140315 VALUES LESS THAN ('2014-03-15'),
PARTITION p20140715 VALUES LESS THAN (MAXVALUE)
);
So basically, you just do a dump of the table, create it with partitions and then import the data into it.
Related
The mysql database(5.7.25-0ubuntu0.18.04.2 (Ubuntu)) reports that i am inserting '1969-12-31 16:00:00' instead of '2019-04-10 13:54:21' into a table in the database.
However, since there is proof that the request that hit the apache2.4 server and then got processed by the php7-fpm workers included this date '4/10/2019 13:54:21'.
I am left bewildered as to why it did not go into the database like this '2019-04-10 13:54:21'. After this code ran:
date('Y-m-d H:i:s', strtotime('4/10/2019 13:54:21'));
This record is going into the table via a stored procedure, however i verified that the params are going in correctly, since another date went in there fine.
new_record | ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION | CREATE DEFINER=`user`#`localhost` PROCEDURE `new_record`(
IN p_account_id integer,
IN p_order_id integer,
IN p_unit_id integer,
IN p_begin_date datetime,
IN p_end_date datetime,
IN p_duration integer,
IN p_status integer
)
BEGIN
INSERT INTO schedule
(
schedule_id,
orders_id,
units_id,
schedule_begintime,
schedule_endtime,
duration,
status_id
)
SELECT 0, p_order_id, p_unit_id, p_begin_date, p_end_date, p_duration, p_status
FROM units AS u
JOIN properties AS p ON(p.properties_id = u.properties_id)
WHERE u.units_id = p_unit_id AND p.owners_id = p_account_id;
SELECT LAST_INSERT_ID() as post_id;
COMMIT;
END | utf8 | utf8_general_ci | utf8_general_ci
Create table looks like this for table in question:
| schedule | CREATE TABLE `schedule` (
`schedule_id` int(11) NOT NULL AUTO_INCREMENT,
`orders_id` int(11) NOT NULL DEFAULT '0',
`units_id` int(11) NOT NULL DEFAULT '0',
`status_id` int(11) NOT NULL DEFAULT '1',
`schedule_begintime` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`schedule_endtime` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`duration` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`schedule_id`),
KEY `orders_id` (`orders_id`),
KEY `units_id` (`units_id`),
KEY `status_id` (`status_id`),
KEY `schedule_begintime` (`schedule_begintime`),
KEY `schedule_endtime` (`schedule_endtime`)
) ENGINE=InnoDB AUTO_INCREMENT=58 DEFAULT CHARSET=utf8
Curiosity has me posting here since i cannot think of:
Why the date would not get accepted as valid by strtotime?
Mysql inserts a default date in there?
There is modification to the array which houses this date value but not modification to this specific key?
The other oddity is that this only happens for one of the users.
Has anyone experienced issues like this? Could they occur due to server load?
I am wondering about the best way for generating mysql reports in large databases, as I have a database for POS application with more than 40 stores working on it.
The database has more that 1.5M rows for each of four tables.
two for headers and other for details.
I am generating reports by joining headers with details and some other tables to get full info for the view.
I tried to archive the data in one table, which has all data required for reporting, but I found it's a huge load, and MySQL events for fetching data to that table not always working, and may lead to data loss.
I've also tried indexing tables, but didn't help too much as the queries are too big and take too much time, which lead to heavy load on the server and may stop the application with not responding at all.
I searched across google, and found some ideas about partitioning tables, and others about archiving, changing the whole engine or even upgrading server requirements.
The relation between two tables (invoice_header and invoice_detail) is (one to many) that the invoice_header is the header of an invoice, with only totals. Which is linked to invoice_detail using location ID (loc_id) and Invoice number (invo_no), as each location has its own serial number. The invoice detail contains the details of each invoice.
Sample Query:
The query takes too long (15 - 20) seconds to fetch
-Total rows: 1495873
-Total Fetched rows: 9 - 12
SELECT SUM(invoice_detail.qty) AS qty, Month(invoice_header.date) AS month
FROM invoice_detail
JOIN invoice_header ON invoice_detail.invo_no = invoice_header.invo_no
AND invoice_detail.loc_id = invoice_header.loc_id
WHERE invoice_detail.item_id = {$itemId}
GROUP BY Month(invoice_header.date)
ORDER BY Month(invoice_header.date)
EXPLAIN:
invoice_header table structure:
CREATE TABLE `invoice_header` (
`invo_type` varchar(1) NOT NULL,
`invo_no` int(20) NOT NULL AUTO_INCREMENT,
`invo_code` varchar(50) NOT NULL,
`date` date NOT NULL,
`time` time NOT NULL,
`cust_id` int(11) NOT NULL,
`loc_id` int(3) NOT NULL,
`cash_man_id` int(11) NOT NULL,
`sales_man_id` int(11) NOT NULL,
`ref_invo_no` int(20) NOT NULL,
`total_amount` decimal(19,2) NOT NULL,
`tax` decimal(19,2) NOT NULL,
`discount_amount` decimal(19,2) NOT NULL,
`net_value` decimal(19,2) NOT NULL,
`split` decimal(19,2) NOT NULL,
`qty` int(11) NOT NULL,
`payment_type_id` varchar(20) NOT NULL,
`comments` varchar(255) NOT NULL,
PRIMARY KEY (`invo_no`,`loc_id`)
) ENGINE=InnoDB AUTO_INCREMENT=20286 DEFAULT CHARSET=utf8
invoice_detail table structure:
CREATE TABLE `invoice_detail` (
`invo_no` int(11) NOT NULL,
`loc_id` int(3) NOT NULL,
`serial` int(11) NOT NULL,
`item_id` varchar(11) NOT NULL,
`size_id` int(5) NOT NULL,
`qty` int(11) NOT NULL,
`rtp` decimal(19,2) NOT NULL,
`type` tinyint(1) NOT NULL,
PRIMARY KEY (`invo_no`,`loc_id`,`serial`),
KEY `item_id` (`item_id`),
KEY `size_id` (`size_id`),
KEY `invo_no` (`invo_no`),
KEY `serial` (`serial`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
After adding "Extract":
EXPLAIN SELECT SUM( invoice_detail.qty ) AS qty, Month( invoice_header.date ) AS
MONTH
FROM invoice_detail
JOIN invoice_header ON invoice_detail.invo_no = invoice_header.invo_no
AND invoice_detail.loc_id = invoice_header.loc_id
WHERE invoice_detail.item_id =11321
GROUP BY EXTRACT(
YEAR_MONTH FROM invoice_header.date )
I am using a quite good dedicated server with:
Intel Xeon Quad Core 3.3GHz (8 threads)
1 Gbps Uplink
16 GB RAM
1,000 GB RAID-1 Drives
25 TB Bandwidth
Any suggestions?
I'm coding a website which will store some offers (ex. job offers). In the end, it could contain more than 1M offers. Now I have problems with some inefficient SQL queries.
Scenario:
Each offer can be assigned into category (ex. IT jobs)
Each category has custom fields (ex. IT jobs can have custom field of type "price" which will represent text box accepting number (price) - in our example, let's say we have price input of expected salary)
Each offer stores meta data with values of these category custom fields
DB fields which will be used for filtering have indexes
Table category (I'm using nested sets to store categories hierarchy):
CREATE TABLE `category` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`parent_id` int(11) DEFAULT NULL,
`lft` int(11) DEFAULT NULL,
`rgt` int(11) DEFAULT NULL,
`depth` int(11) DEFAULT NULL,
`order` int(11) NOT NULL,
`name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
KEY `category_parent_id_index` (`parent_id`),
KEY `category_lft_index` (`lft`),
KEY `category_rgt_index` (`rgt`)
) ENGINE=InnoDB AUTO_INCREMENT=44 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Table category_field:
CREATE TABLE `category_field` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`category_id` int(10) unsigned NOT NULL,
`name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`optional` tinyint(1) NOT NULL DEFAULT '0',
`type` enum('price','number','date','color') COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
KEY `category_field_category_id_index` (`category_id`),
CONSTRAINT `category_field_category_id_foreign` FOREIGN KEY (`category_id`) REFERENCES `category` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Table offer:
CREATE TABLE `offer` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`text` text COLLATE utf8_unicode_ci NOT NULL,
`category_id` int(10) unsigned NOT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
KEY `offer_category_id_index` (`category_id`),
CONSTRAINT `offer_category_id_foreign` FOREIGN KEY (`category_id`) REFERENCES `category` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Table offer_meta:
CREATE TABLE `offer_meta` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`offer_id` int(10) unsigned NOT NULL,
`category_field_id` int(10) unsigned NOT NULL,
`price` double NOT NULL,
`number` int(11) NOT NULL,
`date` date NOT NULL,
`color` varchar(7) COLLATE utf8_unicode_ci NOT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
KEY `offer_meta_offer_id_index` (`offer_id`),
KEY `offer_meta_category_field_id_index` (`category_field_id`),
KEY `offer_meta_price_index` (`price`),
KEY `offer_meta_number_index` (`number`),
KEY `offer_meta_date_index` (`date`),
KEY `offer_meta_color_index` (`color`),
CONSTRAINT `offer_meta_category_field_id_foreign` FOREIGN KEY (`category_field_id`) REFERENCES `category_field` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `offer_meta_offer_id_foreign` FOREIGN KEY (`offer_id`) REFERENCES `offer` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=107769 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
When I set up some filters on my page (for example, for our salary custom field) I have to start with query which returns MIN and MAX prices in available offer_meta records (I want to show a range slider to user in front-end, so I need MIN/MAX values for this range):
select MIN(`price`) AS min, MAX(`price`) AS max from `offer_meta` where `category_field_id` = ? limit 1
I found out that these queries are most inefficient from all queries I'm making (above query takes over 500ms when offer_meta table has few thousand of records).
Other inefficient queries (offer_meta has 107k records):
Obtaining MIN and MAX values for slider to filter numbers
select MIN(`number`) AS min, MAX(`number`) AS max from `offer_meta` where `category_field_id` = ? limit 1
Obtaining MIN and MAX prices for slider to filter by prices
select MIN(`price`) AS min, MAX(`price`) AS max from `offer_meta` where `category_field_id` = ? limit 1
Obtaining MIN and MAX date for date range restrictions
select MIN(`date`) AS min, MAX(`date`) AS max from `offer_meta` where `category_field_id` = ? limit 1
Obtaining colors with counts to show list of colors with numbers
select `color`, count(*) as `count` from `offer_meta` where `category_field_id` = ? group by `color`
Example of full query to get offers count with multiple filter criteria (0.5 sec)
select count(*) as count from `offer` where id in (select
distinct offer_id
from offer_meta om
where offer_id in (select
distinct offer_id
from offer_meta om
where offer_id in (select
distinct offer_id
from offer_meta om
where offer_id in (select
distinct om.offer_id
from offer_meta om
join category_field cf on om.category_field_id = cf.id
where
cf.category_id in (2,3,4,41,43,5,6,7,8,37) and
om.category_field_id = 1 and
om.number >= 1 and
om.number <= 50) and
om.category_field_id = 2 and
om.price >= 2 and
om.price <= 4545) and
om.category_field_id = 3 and
om.date >= '0000-00-00' and
om.date <= '2015-04-09') and
category_field_id = 4 and
om.color in ('#0000ff'))
The same query without aggregation function (COUNT) is few times faster (just to get IDs).
Question:
Is it possible to tweak those queries, or do you have any suggestion on how to implement my logic (offers with categories and custom fields dynamically added in admin to each category) with different table schema? I tried few more schemes, but no success.
Question 2:
Do you think this is my MySQL server problem and if I buy VPS, it will be okay?
Help to understand even better:
I was strongly inspired by WordPress schema for custom fields, so the logic is similar.
Last notes:
Also, I'm working on Laravel framework and I'm using Eloquent ORM.
Sorry for my english, I hope I made my problem clear :-)
Thank you in advance,
Patrik
It is not a MySql problem. in your scenario we found huge data collection. naturally relational databases are not efficient for some queries.(i faced a situation with oracle)
the practice for win this kind of situations is using graph databases.
it seems it is hard with the situation you are facing at the movement.
I heard that the Lucene has some kind of support for indexing large databases for selecting purpose. i dont know how exactly do it.
http://en.wikipedia.org/wiki/Lucene
I have a phpmyadmin database with 1 000 000 record i need to search in. Every week there are 500 000 records added.
so, this is what I need:
location_id value date time name lat lng
3 234 2011-11-18 19:50:00 Amerongen beneden 5.40453 51.97486
4 594 2011-11-18 19:50:00 Amerongen boven 5.41194 51.97507
I do this with this query:
SELECT location_id, value, date, time, locations.name, locations.lat, locations.lng FROM
(
SELECT location_id, value, date, time from `measurements`
LEFT JOIN units ON (units.id = measurements.unit_id)
WHERE units.name='Waterhoogte'
ORDER BY measurements.date DESC, measurements.time DESC
) as last_record
LEFT JOIN locations on (locations.id = location_id)
GROUP BY location_id
which takes 30 seconds. How can I improve this? This is my structure:
CREATE TABLE IF NOT EXISTS `locations` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(255) NOT NULL,
`code` varchar(255) NOT NULL,
`lat` varchar(10) NOT NULL,
`lng` varchar(10) NOT NULL,
`owner_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=244 ;
-- --------------------------------------------------------
--
-- Table structure for table `measurements`
--
CREATE TABLE IF NOT EXISTS `measurements` (
`id` int(11) NOT NULL auto_increment,
`date` date NOT NULL,
`time` time NOT NULL,
`value` varchar(255) NOT NULL,
`location_id` int(11) NOT NULL,
`unit_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=676801 ;
-- --------------------------------------------------------
--
-- Table structure for table `owner`
--
CREATE TABLE IF NOT EXISTS `owner` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=3 ;
-- --------------------------------------------------------
--
-- Table structure for table `units`
--
CREATE TABLE IF NOT EXISTS `units` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(255) NOT NULL,
`description` text NOT NULL,
`unit_short` varchar(255) NOT NULL,
`owner_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=44 ;
What is the limit what phpmyadmin can handle?
Create an index on units.name specifically is a good start.
You should also really rethink the amount of data you are pulling back.
Is someone really going to sift through that many records. Change your query to limit the number of records and think of a UI interface that involves a paging mechanism.
you need to put an index or unique index on units.name.
Add the following indexes:
A composite index (a covering index) on unit.name and unit.id.
A composite index of measurements.date and measurements.time.
An index on location.id
You should try creating an index on units.name as a first step. But understand that there is a tradeoff with an index - read operations will be faster, but it can slow down write operations. If you're concerned about that, or if you're affected by slow writes, then you may want to try creating the index on a smaller number of characters in units.name.
For instance, to declare an index on the first 12 characters of units.name, you'd declare the following:
CREATE INDEX first_twelve ON units (name(12));
Again, this may not be necessary if you don't notice any ill effects from just throwing an index on, but it's something to keep in mind.
SELECT measurements.location_id, measurements.value, measurements.date, measurements.time, locations.name, locations.lat, locations.lng
FROM measurements
LEFT JOIN units ON units.id = measurements.unit_id
LEFT JOIN locations ON locations.id = measurements.location_id
WHERE units.id = 4
GROUP BY measurements.location_id
ORDER BY measurements.date DESC, measurements.time DESC
I have a simple table as below.
CREATE TABLE `stats` (
`id` int(11) NOT NULL auto_increment,
`zones` varchar(100) default NULL,
`date` date default NULL,
`hits` int(100) default NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=7 DEFAULT CHARSET=latin1;
So just storing simple hits counter per zone per day.
But I just want to increment the hits value for the same day.
I have tried the MYSQL DUPLICATE KEY UPDATE but this wont work as I may have many zones on different dates so I cant make them unique or dates.
So the only way I can think is first to do a query to see if a date exists then do a simple if() for insert/update
Is their a better way of doing such a task as there maybe be many 1000's hits per day.
Hope this makes sense :-).
And thanks if you can advise.
Declare the tuple (zone, date) as unique in your CREATE statement. This will make INSERT ... ON DUPLICATE UPDATE work as expected:
CREATE TABLE `stats` (
`id` int(11) NOT NULL auto_increment,
`zone` varchar(100) default NULL,
`date` date default NULL,
`hits` int(100) default NULL,
PRIMARY KEY (`id`),
UNIQUE (`zone`, `date`)
) ENGINE=MyISAM AUTO_INCREMENT=7 DEFAULT CHARSET=latin1;
INSERT INTO stats (zone, date, hits) values ('zone1', 'date1', 1) ON DUPLICATE KEY UPDATE hits = hits + 1;
$result = mysql_query("SELECT id FROM stats WHERE zone=$zone AND date=$today LIMIT 1");
if(mysql_num_rows($result)) {
$id = mysql_result($result,0);
mysql_query("UPDATE stats SET hits=hits+1 WHERE id=$id");
} else {
mysql_query("INSERT INTO stats (zone, date, hits) VALUES ($zone, $today, 1)");
}
Something like that, if I've interpreted you correctly... that's completely untested. You can figure out what the variables are.