I have the following table structure:
Table name: avail
id (autoincremetn) | acc_id | start_date | end_date
-------------------------------------------------------
1 | 175 | 2015-05-26 | 2015-05-31 |
-------------------------------------------------------
2 | 175 | 2015-07-01 | 2015-07-07 |
-------------------------------------------------------
It's used for defining date range availability eg. all dates in between start_date and end_date are unavailable for the given acc_id.
Based on user input I'm closing different ranges but I would like to throw an error IF an user tries to close (submit) a range that has it's start OR end_date somewhere in the range of an already existing one (for the submitted acc_id) in the DB.
In this example a start_date: 2015-05-30 end_date: 2015-06-04 would be a good fail candidate.
I've found this QA:
MySQL overlapping dates, none conflicting
that pretty much explains how to do it in 2 steps, 2 queries with some PHP logic in between.
But I was wondering if it can be done in one insert statement.
I would eventually check for rows affected for success or fail (sub question: is there a more convenient way to check if it failed for some other reason besides date overlap?)
EDIT:
In response to Petr's comment I'll specify further the validation:
any kind of overlapping should be avoided, even the one embracing the
whole range or finding itself inside the existing range. Also, if
start or end dates equal the existing start or end dates it must be
considered an overlap. Sometimes certain acc_id will already have more
than one rang in the table so the validation should be done against
all entries with a given acc_id.
Sadly, using just MySQL this is impossible. Or at least, practically. The preferred way would be using SQL CHECK constraints, these are in the SQL language standard. However, MySQL does not support them.
See: https://dev.mysql.com/doc/refman/5.7/en/create-table.html
The CHECK clause is parsed but ignored by all storage engines.
It seems PostgreSQL does support CHECK constraints on tables, but I'm not sure how viable it is for you to switch database engine or if that's even worth the trouble just to use that feature.
In MySQL a trigger could be used to solve this problem, which would check for overlapping rows before the insert/update occurs and throw an error using the SIGNAL statement. (See: https://dev.mysql.com/doc/refman/5.7/en/signal.html) However, to use this solution you'd have to use an up-to-date MySQL version.
Apart from pure SQL solutions, this typically is done in application logic, so whichever program is accessing the MySQL database typically checks for these kind of constraints by requesting every row that is violated by the new entry in a SELECT COUNT(id) ... statement. If the returned count is larger than 0 it simply doesn't to the insert/update.
Related
I suppose mariadb is working similarly to mysql, this is what I'm using, and I know there is a cache system.
My problem and what I don't understand, is that the pages that I refresh takes a long time to refresh, but the time is not constant at all. Details later.
On page A:
85% of the time, it takes ~7 seconds to execute.
10% of the time, it takes ~27 seconds.
5% of the time it takes under 1 second (when I refresh in very short intervals).
On page B:
80% of the time, it takes ~5 seconds.
Sometimes it's ~2.5 seconds.
Sometimes it's less than a second.
One time it has been >60 seconds, triggering an error.
My code is not changing, it's just observation and refreshing with F5.
Details:
I have a MyISAM table, that gets roughly 150k new rows ("insert") per day.
I am looking to query this table every minutes ("select").
The max rows it could have at a time might range between 50,000,000 and 4,750,000,000...
I'm using PHP to run the queries on the same server.
Structure I'm using currently:
CREATE TABLE `ticks` (
`primary` int(11) NOT NULL AUTO_INCREMENT,
`datetime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`pairs` text NOT NULL,
`price` decimal(18,8) NOT NULL,
`daily_volume` decimal(36,8) NOT NULL,
PRIMARY KEY (`primary`),
KEY `datetime` (`datetime`)
) ENGINE=MyISAM AUTO_INCREMENT=4007125 DEFAULT CHARSET=latin1
Data sample :
|primary | datetime | pairs | price | volume |
-------------------------------------------------------------------------------
|5810228 | 20/01/2018 21:34:02 | BTC_HUC | 0.00002617 | 6.08607929 |
|5810213 | 20/01/2018 21:34:02 | BTC_BELA | 0.00002733 | 8.83542600 |
|5810224 | 20/01/2018 21:34:02 | BTC_FLDC | 0.00000374 | 12.72654326 |
|5810234 | 20/01/2018 21:34:02 | BTC_NMC | 0.00037099 | 4.06446745 |
|5810219 | 20/01/2018 21:34:02 | BTC_CLAM | 0.00070798 | 13.65356478 |
|5810220 | 20/01/2018 21:34:02 | BTC_DASH | 0.07280004 | 423.88604591 |
|1706999 | 11/01/2018 17:09:01 | USDT_BTC | 13590.45341401 | 398959280.2620621|
I have created an index ("normal" index) on datetime.
The query on page A that takes 7 seconds to run with pdo, but ~0.0007 in phpmyadmin :
SELECT DISTINCT(pairs)
FROM ticks
Every heavy computations after this first query takes ~0.5 seconds total most of the time since I indexed datetime.
However, it sometimes takes between 25 and 35 times longer to run for unknown reasons. This is the query that is used (a loop runs it 100 times) :
SELECT datetime, price
FROM ticks
WHERE datetime <= DATE_SUB(NOW(),INTERVAL 1 MINUTE)
AND pairs = \''.$data['pairs'].'\'
ORDER BY datetime DESC
LIMIT 1
I'm not going further into explaining page B because this page is less critical for me and I'm comfortable with the avg execution time related to the number of operations made on this page. My only interrogation is the wide range of execution times that can occur here too.
Questions:
1-How can the execution time differences be so wide, how can I have my pages running in under 1 sec, as it happens sometimes ? My sql queries are extremely simple and fast on the database alone. I believe the db and the php server is located on the same machine.
In particular, I'm wondering why a query would run 10,000 slower with pdo than with phpmyadmin. 7/0.0007 being 10k, there has to be a huge problem here.
Indexing pairs is not changing anything.
2-Have you seen anything incorrect in what I explained that could lead to a fix and improvement of performances? Do you have particular advises to have increased performance in the presented case? For instance, I've been wondering if MyISAM was efficient in my case (I believe so).
There is essentially no reason to use MyISAM any more, especially for performance.
7 seconds is terrible for a page load. How much of that is MySQL actions? Add some timers in the code. This will find out which query is the slowest and let's improve it. (I would guess that one unnecessarily slow query is at the root of your problem.)
"~0.0007" smells like the Query Cache kicked in and it did not really execute the query. I ignore that.
With MyISAM, INSERTs block SELECTs. That could explain the troubles during the insert part of the day.
The table is confusing -- you have a TIMESTAMP (resolution to second), yet there is a "daily_volume" which sounds like a resolution to the "day".
I see TEXT. How long are the rows? If less than 255, use VARCHAR, not TEXT. That would allow you to add INDEX(pairs), which allowSELECT DISTINCT(pairs) FROM ticks to run a lot faster.
But, instead of that index, add INDEX(pairs, datetime) in order to make the second SELECT run much faster.
Shrinking the table size will help some in speed. (By some, I mean anywhere between 10% and 10x, depending on a lot of factors.)
Your decimal sizes are excessive. Find the worst (probably BRKA) and shrink the m,n of DECIMAL(m,n). Currently you are using 9 and 15 bytes for those two columns. You might consider FLOAT (4 bytes, ~7 significant digits) or DOUBLE (8 bytes, ~16 digits).
See my notes on converting to InnoDB . Be aware that the disk footprint might double or triple. (Yes this is an advantage of MyISAM.)
Consider whether some other column (or combination of columns) is unique. If you have such, jetison the column primary and make that column(s) the PRIMARY KEY. If it happens to be (pairs, datetime), then that will give a further performance boost to some queries.
"Indexing pairs is not changing anything." -- Since you can't index a TEXT column without using "prefixing" and prefixing is virtually useless, I am not surprised.
Could you show me a sample of the data? I am not familiar with what a "pair" is.
An index starting with TIMESTAMP or DATETIME is rarely useful; get rid of it unless you have another query that benefits from it.
As for the Query Cache -- size should be no more than 50M. Does the data not change for 23 hours of the day, then there is a flurry of inserts? This would be a good case for using the QC. (Most production servers are better off turning it OFF.) Going above 50M may slow down performance.
After you have addressed most of my suggestions, some other issues may bubble to the surface. That is, I expect you to come back with another Question to finish improving the performance for your app.
How can the execution time differences be so wide, how can I have my pages running in under 1 sec, as it happens sometimes ? My sql queries are extremely simple and fast on the database alone.
It's impossible to answer this question with any degree of certainty without analyzing your platform, monitoring the performance each component, reviewing the code and all the queries, etc. This is way beyond the scope of SO.
What can be said is:
It's unlikely that it has anything to do with PDO itself (or PHPMyAdmin for that matter)
It's typical of a concurrency problem - that is unless you have a server and a database dedicated to rendering "page A" only, other requests and queries happening at the same time can impact performances
MyISAM is notoriously bad at handling a large volume on insert because it uses table locking (in short, it locks all the table every time you make an insert). InnoDB use row based locking which would very probably be much more efficient with 150k writes a day. To quote the MySQL Documentation:
Table locking enables many sessions to read from a table at the same time, but if a session wants to write to a table, it must first get exclusive access, meaning it might have to wait for other sessions to finish with the table first. During the update, all other sessions that want to access this particular table must wait until the update is done.
I'm using tokens for how many messages a user can send (1 message requires 1 token). At the moment I've just got it subtracting the value from an overall value to check if the user has tokens remaining and that's working fine.
I'm trying to change it so that it shows which bundle is active, so I need to check if the user doesn't have enough tokens remaining in the active bundle change to the upcoming_bundle.
Example:
Stored User Data:
Table Name: Tokens
First Record
id: 1
user_id: 5
bundle_type: small
value: 10
value_remaining: 4
state: active_bundle
Second Record
id: 2
user_id: 5
bundle_type: large
value: 100
value_remaining: 100
state: Upcoming_bundle
User sends 10 messages (10 tokens)
Only 4 remaining tokens in first record. Use 4 remaining tokens and leave
6 tokens
Then subtract the 6 tokens from second record which is now active so that will leave 94 remaining tokens.
Should I have a check to database every time the message is sent and update the database to subtract 1 token at a time, then when the remaining_value hits 0 change active_bundle to inactive and upcoming_bundle to active?
If this is your data model then I would fetch all active & upcoming bundles and then do the logic in php, e.g. subtract remaining tokens, change status, etc and then update them as a transaction.
If you are flexible on how the data is structured, I would rather have some kind of transaction log, from which I can read each action, i.e. whether a bundle was added or a token was used with a timestamp. For example like this:
id | user | change | comment | timestamp
1 | 1 | 10 | bought small bundle | 2016-09-06 09:30:00
2 | 1 | -1 | sent message | 2016-09-06 10:56:00
3 | 2 | -3 | sent multi-message | 2016-09-06 10:57:00
Where id is the transaction id, user the user id, change is the number of tokens added (by adding a bundle) or used (by sending one or many messages) and comment a message describing the action. When you want to find out how many tokens there are left you can just do a search for that user and check their SUM(change) instead of weird searches for active/upcoming bundles. Obviously this can be more or less elaborate depending on your needs.
This does not take into account your actual domain! There are more approaches each having their drawbacks. For example my approach might have problems wen the transaction_log-table gets large because of number of users and increased activity, although it is very unlikely (I have seen mysql perform well with a few million records in a similar log table). The important part is: You should figure out what is important to your use case and build a solution around the requirements.
What I would do is, I would subtract it one at a time, not only this is safer, but also a lot easier.
I searched in the internet for an answer to select every columns that matches regex pattern. I didn't find one, or maybe I did, but I didin't understand it, because I'm new to DataBases. So here's the sql I was trying to run:
UPDATE `bartosz` SET 'd%%-%%-15'=1
(I know it's bad)
I have columns like:
ID | d1-1-15 | d2-1-15 | d3-1-15 | d4-1-15 ... (for 5 years, every month, and day)
So is there a way to select all columns from 2015?
I know i can loop it in php so the sql would look like:
UPDATE `bartosz` SET 'd1-1-15'=1, 'd1-1-15'=1, 'd3-1-15'=1 [...]
But it would be really long.
Strongly consider changing your approach. It may be technically possible to have a table with 2000 columns, but you are not using MySQL in a way that gets the most out of the available features such as DATE handling. The below table structure will give better flexibility and scaling in most use cases.
Look into tables with key=>value attributes.
id employee date units
1 james 2015-01-01 2
2 bob 2015-01-01 3
3 james 2015-01-02 6
4 bob 2015-01-02 4
With the above it is possible to write queries without needing to insert hundreds of column names. It will also easily scale beyond 5 years without needing to ALTER the table. Use the DATE column type so you can easily query by date ranges. Also learn how to use INDEXes so you can put a UNIQUE index on the employee and date fields to prevent duplication.
I have the following table structure:
Table name: avail
id (autoincremetn) | acc_id | start_date | end_date
-------------------------------------------------------
1 | 175 | 2015-05-26 | 2015-05-31 |
-------------------------------------------------------
2 | 175 | 2015-07-01 | 2015-07-07 |
-------------------------------------------------------
It's used for defining date range availability eg. all dates in between start_date and end_date are unavailable for the given acc_id.
Based on user input I'm closing different ranges but I would like to throw an error IF an user tries to close (submit) a range that has it's start OR end_date somewhere in the range of an already existing one (for the submitted acc_id) in the DB.
In this example a start_date: 2015-05-30 end_date: 2015-06-04 would be a good fail candidate.
I've found this QA:
MySQL overlapping dates, none conflicting
that pretty much explains how to do it in 2 steps, 2 queries with some PHP logic in between.
But I was wondering if it can be done in one insert statement.
I would eventually check for rows affected for success or fail (sub question: is there a more convenient way to check if it failed for some other reason besides date overlap?)
EDIT:
In response to Petr's comment I'll specify further the validation:
any kind of overlapping should be avoided, even the one embracing the
whole range or finding itself inside the existing range. Also, if
start or end dates equal the existing start or end dates it must be
considered an overlap. Sometimes certain acc_id will already have more
than one rang in the table so the validation should be done against
all entries with a given acc_id.
Sadly, using just MySQL this is impossible. Or at least, practically. The preferred way would be using SQL CHECK constraints, these are in the SQL language standard. However, MySQL does not support them.
See: https://dev.mysql.com/doc/refman/5.7/en/create-table.html
The CHECK clause is parsed but ignored by all storage engines.
It seems PostgreSQL does support CHECK constraints on tables, but I'm not sure how viable it is for you to switch database engine or if that's even worth the trouble just to use that feature.
In MySQL a trigger could be used to solve this problem, which would check for overlapping rows before the insert/update occurs and throw an error using the SIGNAL statement. (See: https://dev.mysql.com/doc/refman/5.7/en/signal.html) However, to use this solution you'd have to use an up-to-date MySQL version.
Apart from pure SQL solutions, this typically is done in application logic, so whichever program is accessing the MySQL database typically checks for these kind of constraints by requesting every row that is violated by the new entry in a SELECT COUNT(id) ... statement. If the returned count is larger than 0 it simply doesn't to the insert/update.
This question already has answers here:
Insert, on duplicate update in PostgreSQL?
(18 answers)
Closed 1 year ago.
I asked this last night, and got information on merging (which is unavailable in postgresql). I'm willing to try the workaround suggested But I'm just trying to understand why it can't be done with conditional logic.
I've clarified the question a bit, so maybe this will be easier to understand.
I have a query that inserts data into a table. But it is creating a new record every time. Is there a way I can check if the row is there first, then if it is, UPDATE, and if it isn't INSERT?
$user = 'username';
$timestamp = date('Y-m-d G:i:s.u');
$check_time = "start"; //can also be stop
$check_type = "start_user"; //can also be stop_user
$insert_query = "INSERT INTO production_order_process_log (
production_order_id, production_order_process_id, $check_time, $check_type)
VALUES (
'$production_order_id', '$production_order_process_id', '$timestamp', '$user')";
The idea is that the table will record check-in and check-out values (production_order_process_log.start and production_order_process_log.stop). So before a record with a check-out time stamp is made, the query should check to see if the $production_order_process_id already exists. if it does exist, then the timestamp can go into stop and the $check_type can be stop_user. Otherwise, they can stay start and start_user.
I am basically trying to avoid this result.
+----+---------------------+--------------------------------+--------------------+-------------------+-------------+-------------+
| id | production_order_id | production_order_process_id | start | stop | start_user | stop_user |
+----+---------------------+--------------------------------+--------------------+-------------------+-------------+-------------+
| 8 | 2343 | 1000 | 12 july 03:23:23 | NULL | tlh | NULL |
+----+---------------------+--------------------------------+--------------------+-------------------+-------------+-------------+
| 9 | 2343 | 1000 | NULL | 12 july 03:45:00 | NULL | tlh |
+----+---------------------+--------------------------------+--------------------+-------------------+-------------+-------------+
Many thanks for helping me suss out the postgresql logic to do this task.
This question and answer will be of interest to you: Insert, on duplicate update in PostgreSQL?
Basically, either use two queries (do the select, if it's found update, otherwise insert), which is not the best solution (two scripts running simultaneously could give duplicate inserts), or do as the above questions suggests - make a stored procedure/function to do it (this is probably the best option, and easiest).
Recognizing the nature of your workflow, it seems that an order can not stop before or at the same time as it starts, right? And it had to have started in order to stop, right? Please correct me if I'm wrong.
So you could just check whether it's a start operation and do an INSERT in that case, or stop operation and do an UPDATE.
I feel like concurrency doesn't really come into play here.