I was looking to a lot of examples, but I couldn't find an answer. What I need to do is check if there is free space for new orders. In one time I can have maximum 5 customers. Order duration is not limited, customers just selects date-time range in picker.
For example my DB records are:
`id` `start` `end`
`1` `2017/06/10 10:00` `2017/06/15 08:00`
`2` `2017/06/11 10:00` `2017/06/16 08:00`
`3` `2017/06/12 10:00` `2017/06/17 08:00`
`4` `2017/06/13 10:00` `2017/06/18 08:00`
`5` `2017/06/14 10:00` `2017/06/19 08:00`
Customer want to reserve from 2017/06/11 08:00 until 2017/06/15 12:00, but I can't let him because this period coincides with more than 5 records in my DB. How can I do it in mysql select query?
Try:
Select count(*)>5 as vacant
From orders
Where (input_start < end and input_start>start ) or
(Input_end > start and input start<start)
Where input are your search values
You can do count the number of matching records using logic like this:
select count(*)
from t
where $start < end and $end > start;
This counts the number of records that overlap with ($start, $end). You can put the comparison to "5" in your application. (Or just use select count(*) < 5.
Note: You should be passing in the values as parameters.
Try likethis :
select id from table where id not in
/*the query below will get the wrong result */
(select id from table where yourstartdate between start and end or yourenddate between start and end);
There are many, many solutions. You've tagged this as PHP so there is scope to do this in the application logic as well as one the storage.
You could separate the 5 slots across attributes (which are constrained by definition) rather than across rows (which are not constrained). Then let the application decide if the transaction is viable (although you could map this to a single slot-per-record representation using views and triggers).
While as others have suggested, you could count the number of slots currently booked to determine if there is availability (but this would need be done at least twice and you need to avoid race conditions). For preference this would be implemented as a before-insert trigger.
An alternative approach or enhancement to this would be to add a primary/unique key to the table containing the reservations including an enumerated column (with 5 possible values and a non-null constraint) making it impossible to inject more than 5 records for one slot.
But this is all about preventing a state which is considered inconsistent. Your first line of defence is presenting the user with options which are unlikely to cause such conflicts. The obvious case is not to present slots which already have 5 bookings.
Related
Below is the format of the database of Autonomous System Numbers ( download and parsed from this site! ).
range_start range_end number cc provider
----------- --------- ------ -- -------------------------------------
16778240 16778495 56203 AU AS56203 - BIGRED-NET-AU Big Red Group
16793600 16809983 18144 AS18144
745465 total rows
A Normal query looks like this:
select * from table where 3232235520 BETWEEN range_start AND range_end
Works properly but I query a huge number of IPs to check for their AS information which ends up taking too many calls and time.
Profiler Snapshot:
Blackfire profiler snapshot
I've two indexes:
id column
a combine index on the range_start and range_end column as both the make unique row.
Questions:
Is there a way to query a huge number of IPs in a single query?
multiple where (IP between range_start and range_end) OR where (IP between range_start and range_end) OR ... works but I can't get the IP -> row mapping or which rows are retrieved for which IP.
Any suggestions to change the database structure to optimize the query speed and decrease the time?
Any help will be appreciated! Thanks!
It is possible to query more than one IP address. Several approaches we could take. Assuming range_start and range_end are defined as integer types.
For a reasonable number of ip addresses, we could use an inline view:
SELECT i.ip, a.*
FROM ( SELECT 3232235520 AS ip
UNION ALL SELECT 3232235521
UNION ALL SELECT 3232235522
UNION ALL SELECT 3232235523
UNION ALL SELECT 3232235524
UNION ALL SELECT 3232235525
) i
LEFT
JOIN ip_to_asn a
ON a.range_start <= i.ip
AND a.range_end >= i.ip
ORDER BY i.ip
This approach will work for a reasonable number of IP addresses. The inline view could be extended with more UNION ALL SELECT to add additional IP addresses. But that's not necessarily going to work for a "huge" number.
When we get "huge", we're going to run into limitations in MySQL... maximum size of a SQL statement limited by max_allowed_packet, there may be a limit on the number of SELECT that can appear.
The inline view could be replaced with a temporary table, built first.
DROP TEMPORARY TABLE IF EXISTS _ip_list_;
CREATE TEMPORARY TABLE _ip_list_ (ip BIGINT NOT NULL PRIMARY KEY) ENGINE=InnoDB;
INSERT INTO _ip_list_ (ip) VALUES (3232235520),(3232235521),(3232235522),...;
...
INSERT INTO _ip_list_ (ip) VALUES (3232237989),(3232237990);
Then reference the temporary table in place of the inline view:
SELECT i.ip, a.*
FROM _ip_list_ i
LEFT
JOIN ip_to_asn a
ON a.range_start <= i.ip
AND a.range_end >= i.ip
ORDER BY i.ip ;
And then drop the temporary table:
DROP TEMPORARY TABLE IF EXISTS _ip_list_ ;
Some other notes:
Churning database connections is going to degrade performance. There's a significant amount overhead in establishing and tearing down a connection. That overhead get noticeable if the application is repeatedly connecting and disconnecting, if its doing that for every SQL statement being issued.
And running an individual SQL statement also has overhead... the statement has to be sent to the server, the statement parsed for syntax, evaluated from semantics, choose an execution plan, execute the plan, prepare a resultset, return the resultset to the client. And this is why it's more efficient to process set wise rather than row wise. Processing RBAR (row by agonizing row) can be very slow, compared to sending a statement to the database and letting it process a set in one fell swoop.
But there's a tradeoff there. With ginormous sets, things can start to get slow again.
Even if you can process two IP addresses in each statement, that halves the number of statements that need to be executed. If you do 20 IP addresses in each statement, that cuts down the number of statements to 5% of the number that would be required a row at a time.
And the composite index already defined on (range_start,range_end) is appropriate for this query.
FOLLOWUP
As Rick James points out in a comment, the index I earlier said was "appropriate" is less than ideal.
We could write the query a little differently, that might make more effective use of that index.
If (range_start,range_end) is UNIQUE (or PRIMARY) KEY, then this will return one row per IP address, even when there are "overlapping" ranges. (The previous query would return all of the rows that had a range_start and range_end that overlapped with the IP address.)
SELECT t.ip, a.*
FROM ( SELECT s.ip
, s.range_start
, MIN(e.range_end) AS range_end
FROM ( SELECT i.ip
, MAX(r.range_start) AS range_start
FROM _ip_list_ i
LEFT
JOIN ip_to_asn r
ON r.range_start <= i.ip
GROUP BY i.ip
) s
LEFT
JOIN ip_to_asn e
ON e.range_start = s.range_start
AND e.range_end >= s.ip
GROUP BY s.ip, s.range_start
) t
LEFT
JOIN ip_to_asn a
ON a.range_start = t.range_start
AND a.range_end = t.range_end
ORDER BY t.ip ;
With this query, for the innermost inline view query s, the optimizer might be able to make effective use of an index with a leading column of range_start, to quickly identify the "highest" value of range_start (that is less than or equal to the IP address). But with that outer join, and with the GROUP BY on i.ip, I'd really need to look at the EXPLAIN output; it's only conjecture what the optimizer might do; what is important is what the optimizer actually does.)
Then, for inline view query e, MySQL might be able to make more effective use of the composite index on (range_start,range_end), because of the equality predicate on the first column, and the inequality condition on MIN aggregate on the second column.
For the outermost query, MySQL will surely be able to make effective use of the composite index, due to the equality predicates on both columns.
A query of this form might show improved performance, or performance might go to hell in a handbasket. The output of EXPLAIN should give a good indication of what's going on. We'd like to see "Using index for group-by" in the Extra column, and we only want to see a "Using filesort" for the ORDER BY on the outermost query. (If we remove the ORDER BY clause, we want to not see "Using filesort" in the Extra column.)
Another approach is to make use of correlated subqueries in the SELECT list. The execution of correlated subqueries can get expensive when the resultset contains a large number of rows. But this approach can give satisfactory performance for some use cases.
This query depends on no overlapping ranges in the ip_to_asn table, and this query will not produce the expected results when overlapping ranges exist.
SELECT t.ip, a.*
FROM ( SELECT i.ip
, ( SELECT MAX(s.range_start)
FROM ip_to_asn s
WHERE s.range_start <= i.ip
) AS range_start
, ( SELECT MIN(e.range_end)
FROM ip_to_asn e
WHERE e.range_end >= i.ip
) AS range_end
FROM _ip_list_ i
) r
LEFT
JOIN ip_to_asn a
ON a.range_start = r.range_start
AND a.range_end = r.range_end
As a demonstration of why overlapping ranges will be a problem for this query, given a totally goofy, made up example
range_start range_end
----------- ---------
.101 .160
.128 .244
Given an IP address of .140, the MAX(range_start) subquery will find .128, the MIN(range_end) subquery will find .160, and then the outer query will attempt to find a matching row range_start=.128 AND range_end=.160. And that row just doesn't exist.
This is a duplicate of the question here however I'm not voting to close it, as the accepted answer in that question is not very helpful; the answer by Quassnoi is much better (but it only links to the solution).
A linear index is not going to help resolve a database of ranges. The solution is to use geospatial indexing (available in MySQL and other DBMS). An added complication is that MySQL geospatial indexing only works in 2 dimensions (while you have a 1-D dataset) so you need to map this to 2-dimensions.
Hence:
CREATE TABLE IF NOT EXISTS `inetnum` (
`from_ip` int(11) unsigned NOT NULL,
`to_ip` int(11) unsigned NOT NULL,
`netname` varchar(40) default NULL,
`ip_txt` varchar(60) default NULL,
`descr` varchar(60) default NULL,
`country` varchar(2) default NULL,
`rir` enum('APNIC','AFRINIC','ARIN','RIPE','LACNIC') NOT NULL default 'RIPE',
`netrange` linestring NOT NULL,
PRIMARY KEY (`from_ip`,`to_ip`),
SPATIAL KEY `rangelookup` (`netrange`)
) ENGINE=MyISAM DEFAULT CHARSET=ascii;
Which might be populated with....
INSERT INTO inetnum
(from_ip, to_ip
, netname, ip_txt, descr, country
, netrange)
VALUES
(INET_ATON('127.0.0.0'), INET_ATON('127.0.0.2')
, 'localhost','127.0.0.0-127.0.0.2', 'Local Machine', '.',
GEOMFROMWKB(POLYGON(LINESTRING(
POINT(INET_ATON('127.0.0.0'), -1),
POINT(INET_ATON('127.0.0.2'), -1),
POINT(INET_ATON('127.0.0.2'), 1),
POINT(INET_ATON('127.0.0.0'), 1),
POINT(INET_ATON('127.0.0.0'), -1))))
);
Then you might want to create a function to wrap the rather verbose SQL....
DROP FUNCTION `netname2`//
CREATE DEFINER=`root`#`localhost` FUNCTION `netname2`(p_ip VARCHAR(20) CHARACTER SET ascii) RETURNS varchar(80) CHARSET ascii
READS SQL DATA
DETERMINISTIC
BEGIN
DECLARE l_netname varchar(80);
SELECT CONCAT(country, '/',netname)
INTO l_netname
FROM inetnum
WHERE MBRCONTAINS(netrange, GEOMFROMTEXT(CONCAT('POINT(', INET_ATON(p_ip), ' 0)')))
ORDER BY (to_ip-from_ip)
LIMIT 0,1;
RETURN l_netname;
END
And therefore:
SELECT netname2('127.0.0.1');
./localhost
Which uses the index:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE inetnum range rangelookup rangelookup 34 NULL 1 Using where; Using filesort
(and takes around 10msec to find a record from the combined APNIC,AFRINIC,ARIN,RIPE and LACNIC datasets on the very low spec VM I'm using here)
You can compare IP ranges using MySQL. This question might contain an answer you're looking for: MySQL check if an IP-address is in range?
SELECT * FROM TABLE_NAME WHERE (INET_ATON("193.235.19.255") BETWEEN INET_ATON(ipStart) AND INET_ATON(ipEnd));
You will likely want to index your database. This optimizes the time it takes to search your database, similar to the index you will find in the back of a textbook, but for databases:
ALTER TABLE `table` ADD INDEX `name` (`column_id`)
EDIT: Apparently INET_ATON cannot be used on indexed databases, so you would have to pick one of these!
I'm using PHP and MYSQL(innodb engine).
As MYSQL reference says, selecting with comparison of one column and ordering by another can't use our considered index.
I have a table named News.
This table has at least 1 million records with two important columns: time_added and number_of_views.
I need to select most viewed records from last n hours. What is the best index to do this? Or is it possible to run this kind of queries very fast for a table with millions of records?
I've already done this for "last day", meaning I can select most viewed records from last day by adding a new column (date_added). But if I decide to select these records from last week, I'm in trouble again.
First, write the query:
select n.*
from news n
where time_added >= date_sub(now(), interval <n> hours)
order by number_of_views desc
limit ??;
The best index is (time_added, number_of_views). Actually, number_of_views won't be used for the full query, but I would include it for other possible queries.
First you must add the following line to the my.cnf (in section
[mysqld]):
query_cache_size = 32M (or more).
query_cache_limit = 32M (or more)
query_cache_size Sets size of the cache
Another option, which should pay attention - this query_cache_limit - it sets the maximum amount of the result of the query, which can be placed in the cache.
Check the status of the cache, you can request the following:
show global status like 'Qcache%';
http://dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3). For more information, see http://dev.mysql.com/doc/refman/5.7/en/multiple-column-indexes.html
You need a summary table. Since 'hour' is your granularity, something like this might work:
CREATE TABLE HourlyViews (
the_hour DATETIME NOT NULL,
ct SMALLINT UNSIGNED NOT NULL,
PRIMARY KEY(the_hour)
) ENGINE=InnoDB;
It might need another column (and add it to the PK) if there is some breakdown of the items you are counting. And you might want some other things SUM'd or COUNT'd in this table.
Build and maintain this table incrementally. That is, every hour, add another row to the table. (Or you could keep it updated with INSERT .. ON DUPLICATE KEY UPDATE ...)
More on Summary Tables
Then change the query to use that table; it will be a lot faster.
I have a table with that contains information that I would like to show stats for each day between a particular range of dates. The table consists of the following:
CREATE TABLE IF NOT EXISTS `questions` (
`qid` int(10) unsigned NOT NULL AUTO_INCREMENT,
`question_created_date` datetime NOT NULL,
`question_response_date` datetime NOT NULL,
)
I am able to use the following query to displays the amount of questions created on each day between $start_date and $end_date.
SELECT COUNT(*) as q_count, DATE(question_created_date) as date FROM questions where question_created_date between "$start_date" and "$end_date" GROUP BY date ORDER BY date asc
I would also like for it to list the amount of topics that were responded to on the same days. For exmaple if on 7/5/11 there were 10 new questions and 5 questions responded to and on 7/6/11 there were 5 new questions and 11 questions responded to, I want to be able to show:
Date|New|Responded
7/5/11 10 05
7/6/11 05 11
Can this be done via mysql?
I know I can perform the query listed above and then for each day perform another query to get the amount of questions responded to for that day, but I was wondering if there was an easier and more efficient way to do this in one query.
Thanks in advance.
The two dates aren't related to each other, except that logically, the response date could only ever be greater than the created date. Grouping on one date will destroy any information you could extract from the other date, so in other words - you'll need two queries.
The fields returned would be the same - a count and a date field, so you could issue this is as a single query call, but it'd have to be done as a union query:
SELECT 'created' as src, count(*) as cnt, DATE(question_created_date) as created ...
UNION
SELECT 'answered' as src, count(*) as cnt, etc...
The virtual 'src' column will tell you which internal query produced the result, and the union syntax lets you issue this all as a single query. But within MySQL, it'd still be treated as two completely separate queries, whose results just happen be returned all at once.
You could do something like this using UNION. MySQL possibly has a built in operator that is more efficient, but this will work too:
select main.date, max(qcd), max(qrd) from
(SELECT DATE(question_created_date) as date, count(*) as qcd, 0 as qrd FROM questions
where question_created_date between "$startdate" and "$enddate" GROUP BY date
union
SELECT DATE(question_response_date) as date, 0, count(*) as qrd FROM questions
where question_response_date between "$startdate" and "$enddate" GROUP BY date)
main group by main.date order by main.date asc;
I'm trying to optimize a report query run on an ecommerce site. I'm pretty sure that I'm doing something stupid, since this query shouldn't be taking nearly as long to run as it does.
The query in question is:
SELECT inventories_name, inventories_code, SUM(shop_orders_inventories_qty) AS qty,
SUM(shop_orders_inventories_price) AS tot_price, inventories_categories_name,
inventories_price_list, inventories_id
FROM shop_orders
LEFT JOIN shop_orders_inventories ON (shop_orders_id = join_shop_orders_id)
LEFT JOIN inventories ON (join_inventories_id = inventories_id)
WHERE {$date_type} BETWEEN '{$start_date}' AND '{$end_date}'
AND shop_orders_x_response_code = 1
GROUP BY join_inventories_id, join_shop_categories_id
{$order}
{$limit}
It's basically trying to get total sales per item over a period of time; values in curly brackets are filled in via a form. It works fine for a period of a couple days, but querying a time interval of a week or more can take 30 seconds+.
I feel like it's joining way too many rows in order to calculate the aggregate values and sucking up huge amounts of memory, but I'm not sure how to limit it.
Note - I realize that I'm selecting fields which aren't in the group by, but they correspond 1-1 with inventory ID, which is in the group by.
Any suggestions?
-- Edit --
The current indices are:
inventories:
join_categories - BTREE
inventories_name, inventories_code, inventories_description - FULLTEXT
shop_orders_inventories:
shop_orders_inventories_id - BTREE
shop_orders:
shop_orders_id - BTREE
Two sequential left joins will work quite long on a big table. Try to use "join" instead of "left join" (unless you have records in shop_orders with now matching records in shop_orders_inventories or inventories) or split this query to couple of small ones. Also by using "sum" and "group by" you are forcing MySQL to create temp tables - you might want to increase MySQL cache so those tables would fit in to memory (otherwise MySQL will dump them to disk which will also increase SQL execution time).
The first and foremost rule to indexing is... index the columns that you will search on!
For each possible value of {$date_type}, create an index for that date column.
Once you have lots of data in the table (say 2 years or 100 weeks), a single week's data is 1% of the index, so it becomes a good starting point.
Even though MySQL allows non-aggregates in the SELECT clause, I personally would sync the two
SELECT inventories_name, inventories_code,
SUM(shop_orders_inventories_qty) AS qty,
SUM(shop_orders_inventories_price) AS tot_price,
inventories_categories_name, inventories_price_list, inventories_id
FROM ...
GROUP BY inventories_id, join_shop_categories_id, inventories_name,
inventories_code, inventories_categories_name, inventories_price_list
...
I have three tables, each contain some common information, and some information that is unique to the table.
For example: uid, date are universal among the tables, but one table can contain a column type while the other contains currency.
I need to query the database and get the last 20 entries (date DESC) that have been entered in all three tables.
My options are:
Query the database once, with one large query, containing three UNION ALL clauses, and pass along fake values for columns, IE:
FROM (
SELECT uid, date, currency, 0, 0, 0
and later on
FROM (
SELECT uid, date, 0, type, 0, 0
This would leave me with allot of null-valued fields..
OR I can query the database three times, and somehow within PHP sort through the information to get the combined latest 20 posts. This would leave me with an excess of information - 60 posts to look through (LIMIT 20) * 3 - and force me to preform some type of addtional quicksort every time.
What option is better/any alternate ideas?
Thanks.
Those two options are more similar than you make it sound.
When you perform the single large query with UNIONs, MySQL will still be performing three separate queries, just as you propose doing in your alternative plan, and then combining them into a single result.
So, you can either let MySQL do the filtering (and LIMIT) for you, or you can do it yourself. Given that choice, letting MySQL do all the work sounds far preferable.
Having extra columns in the result set could theoretically hinder performance, but with so small a result set as your 20 rows, I wouldn't expect it to have any detectable impact.
It all depends of how big your tables are. If each table has a few thousands records, you can go with the first solution (UNION), and you'll be fine.
On bigger tables, I'd probably go with the second solution, mostly because it will use much less ressources (RAM) than the UNION way, and still be reasonably fast.
But I would advise you to think about your data model, and maybe optimize it. The fact you have to use UNION-based queries usually means there's room for optimization, typically by merging the three tables, with an added "type" field (names isn't good at all, but you see my point).
if you know your limits you can limit each query and had union only run on little data. this should be better as mysql will return only 20 rows and will make the sorting faster then you can in php...
select * from (
SELECT uid, date, currency, 0, 0, 0 from table_a order by date desc limit 20
union
SELECT uid, date, 0, type, 0, 0 from table_b order by date desc limit 20
...
) order by date desc limit 20