I suppose mariadb is working similarly to mysql, this is what I'm using, and I know there is a cache system.
My problem and what I don't understand, is that the pages that I refresh takes a long time to refresh, but the time is not constant at all. Details later.
On page A:
85% of the time, it takes ~7 seconds to execute.
10% of the time, it takes ~27 seconds.
5% of the time it takes under 1 second (when I refresh in very short intervals).
On page B:
80% of the time, it takes ~5 seconds.
Sometimes it's ~2.5 seconds.
Sometimes it's less than a second.
One time it has been >60 seconds, triggering an error.
My code is not changing, it's just observation and refreshing with F5.
Details:
I have a MyISAM table, that gets roughly 150k new rows ("insert") per day.
I am looking to query this table every minutes ("select").
The max rows it could have at a time might range between 50,000,000 and 4,750,000,000...
I'm using PHP to run the queries on the same server.
Structure I'm using currently:
CREATE TABLE `ticks` (
`primary` int(11) NOT NULL AUTO_INCREMENT,
`datetime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`pairs` text NOT NULL,
`price` decimal(18,8) NOT NULL,
`daily_volume` decimal(36,8) NOT NULL,
PRIMARY KEY (`primary`),
KEY `datetime` (`datetime`)
) ENGINE=MyISAM AUTO_INCREMENT=4007125 DEFAULT CHARSET=latin1
Data sample :
|primary | datetime | pairs | price | volume |
-------------------------------------------------------------------------------
|5810228 | 20/01/2018 21:34:02 | BTC_HUC | 0.00002617 | 6.08607929 |
|5810213 | 20/01/2018 21:34:02 | BTC_BELA | 0.00002733 | 8.83542600 |
|5810224 | 20/01/2018 21:34:02 | BTC_FLDC | 0.00000374 | 12.72654326 |
|5810234 | 20/01/2018 21:34:02 | BTC_NMC | 0.00037099 | 4.06446745 |
|5810219 | 20/01/2018 21:34:02 | BTC_CLAM | 0.00070798 | 13.65356478 |
|5810220 | 20/01/2018 21:34:02 | BTC_DASH | 0.07280004 | 423.88604591 |
|1706999 | 11/01/2018 17:09:01 | USDT_BTC | 13590.45341401 | 398959280.2620621|
I have created an index ("normal" index) on datetime.
The query on page A that takes 7 seconds to run with pdo, but ~0.0007 in phpmyadmin :
SELECT DISTINCT(pairs)
FROM ticks
Every heavy computations after this first query takes ~0.5 seconds total most of the time since I indexed datetime.
However, it sometimes takes between 25 and 35 times longer to run for unknown reasons. This is the query that is used (a loop runs it 100 times) :
SELECT datetime, price
FROM ticks
WHERE datetime <= DATE_SUB(NOW(),INTERVAL 1 MINUTE)
AND pairs = \''.$data['pairs'].'\'
ORDER BY datetime DESC
LIMIT 1
I'm not going further into explaining page B because this page is less critical for me and I'm comfortable with the avg execution time related to the number of operations made on this page. My only interrogation is the wide range of execution times that can occur here too.
Questions:
1-How can the execution time differences be so wide, how can I have my pages running in under 1 sec, as it happens sometimes ? My sql queries are extremely simple and fast on the database alone. I believe the db and the php server is located on the same machine.
In particular, I'm wondering why a query would run 10,000 slower with pdo than with phpmyadmin. 7/0.0007 being 10k, there has to be a huge problem here.
Indexing pairs is not changing anything.
2-Have you seen anything incorrect in what I explained that could lead to a fix and improvement of performances? Do you have particular advises to have increased performance in the presented case? For instance, I've been wondering if MyISAM was efficient in my case (I believe so).
There is essentially no reason to use MyISAM any more, especially for performance.
7 seconds is terrible for a page load. How much of that is MySQL actions? Add some timers in the code. This will find out which query is the slowest and let's improve it. (I would guess that one unnecessarily slow query is at the root of your problem.)
"~0.0007" smells like the Query Cache kicked in and it did not really execute the query. I ignore that.
With MyISAM, INSERTs block SELECTs. That could explain the troubles during the insert part of the day.
The table is confusing -- you have a TIMESTAMP (resolution to second), yet there is a "daily_volume" which sounds like a resolution to the "day".
I see TEXT. How long are the rows? If less than 255, use VARCHAR, not TEXT. That would allow you to add INDEX(pairs), which allowSELECT DISTINCT(pairs) FROM ticks to run a lot faster.
But, instead of that index, add INDEX(pairs, datetime) in order to make the second SELECT run much faster.
Shrinking the table size will help some in speed. (By some, I mean anywhere between 10% and 10x, depending on a lot of factors.)
Your decimal sizes are excessive. Find the worst (probably BRKA) and shrink the m,n of DECIMAL(m,n). Currently you are using 9 and 15 bytes for those two columns. You might consider FLOAT (4 bytes, ~7 significant digits) or DOUBLE (8 bytes, ~16 digits).
See my notes on converting to InnoDB . Be aware that the disk footprint might double or triple. (Yes this is an advantage of MyISAM.)
Consider whether some other column (or combination of columns) is unique. If you have such, jetison the column primary and make that column(s) the PRIMARY KEY. If it happens to be (pairs, datetime), then that will give a further performance boost to some queries.
"Indexing pairs is not changing anything." -- Since you can't index a TEXT column without using "prefixing" and prefixing is virtually useless, I am not surprised.
Could you show me a sample of the data? I am not familiar with what a "pair" is.
An index starting with TIMESTAMP or DATETIME is rarely useful; get rid of it unless you have another query that benefits from it.
As for the Query Cache -- size should be no more than 50M. Does the data not change for 23 hours of the day, then there is a flurry of inserts? This would be a good case for using the QC. (Most production servers are better off turning it OFF.) Going above 50M may slow down performance.
After you have addressed most of my suggestions, some other issues may bubble to the surface. That is, I expect you to come back with another Question to finish improving the performance for your app.
How can the execution time differences be so wide, how can I have my pages running in under 1 sec, as it happens sometimes ? My sql queries are extremely simple and fast on the database alone.
It's impossible to answer this question with any degree of certainty without analyzing your platform, monitoring the performance each component, reviewing the code and all the queries, etc. This is way beyond the scope of SO.
What can be said is:
It's unlikely that it has anything to do with PDO itself (or PHPMyAdmin for that matter)
It's typical of a concurrency problem - that is unless you have a server and a database dedicated to rendering "page A" only, other requests and queries happening at the same time can impact performances
MyISAM is notoriously bad at handling a large volume on insert because it uses table locking (in short, it locks all the table every time you make an insert). InnoDB use row based locking which would very probably be much more efficient with 150k writes a day. To quote the MySQL Documentation:
Table locking enables many sessions to read from a table at the same time, but if a session wants to write to a table, it must first get exclusive access, meaning it might have to wait for other sessions to finish with the table first. During the update, all other sessions that want to access this particular table must wait until the update is done.
Related
I have a php script which iterates through a JSON file line by line (using JsonMachine), checks each line for criteria (foreach); if criteria are met it checks if it's already in a database, and then it imports/updates (MYSQL 8.0.26). As an example last time this script ran it iterated through 65,000 rows and imported 54,000 of them in 24 seconds.
Each JSON row has a UUID as unique key, and I am importing this as a VARCHAR(36).
I read that it can be advantageous to store the UUIDs as BINARY(16) using uuid_to_bin and bin_to_uuid so I coded the script to store the UUID as binary and the read php scripts to unencode back to UUID, and the database fields to BINARY(16).
This worked functionally, but the script import time went from 24 seconds to 30 minutes. The server was not CPU-bound during that time, running at 25 to 30% (normally <5%).
The script without uuid conversion runs at about 3,000 lines per second, using uuid conversion it runs at about 30 lines per second.
The question: can anyone with experience on bulk importing using uuid_to_bin comment on performance?
I've reverted to native UUID storage, but I'm interested to hear others' experience.
EDIT with extra info from comments and replies:
The UUID is the primary key
The server is a VM with 12GB and 4 x assigned cores
The table is 54,000 rows (from the import), and is 70MB in size
Innodb buffer pool size is not changed from default, 128MB: 134,217,728
Oh, bother. UUID_TO_BIN changed the UUID values from being scattered to being roughly chronologically ordered (for type 1 uuids). This helps performance by clustering rows on disk better.
First, let's check the type. Please display one (any one) of the 36-char uuids or HEX(binary) using the 16-byte binary version. After that, I will continue this answer depending on whether it is type 1 or some other type.
Meanwhile, some other questions (to help me focus on the root cause):
What is the value of innodb_buffer_pool_size?
How much RAM?
How big is the table?
Were the incoming uuids in some particular order?
A tip: Use IODKU instead of SELECT + (UPDATE or INSERT). That will double the speed.
Then batch them 100 at a time. That may give another 10x speedup.
More
Your UUIDs are type 4 -- random. UUID_TO_BIN() changes from one random order to another. (Dropping from 36 bytes to 16 is still beneficial.)
innodb_buffer_pool_size -- 128M is an old, too small, default. If you have more than 4GB, set that to about 70% of RAM. This change should help performance significantly. Your VM has 12GB, so change the setting to 8G. This will eliminate most of the I/O, which is the slow part of SQL.
I have the following table structure:
Table name: avail
id (autoincremetn) | acc_id | start_date | end_date
-------------------------------------------------------
1 | 175 | 2015-05-26 | 2015-05-31 |
-------------------------------------------------------
2 | 175 | 2015-07-01 | 2015-07-07 |
-------------------------------------------------------
It's used for defining date range availability eg. all dates in between start_date and end_date are unavailable for the given acc_id.
Based on user input I'm closing different ranges but I would like to throw an error IF an user tries to close (submit) a range that has it's start OR end_date somewhere in the range of an already existing one (for the submitted acc_id) in the DB.
In this example a start_date: 2015-05-30 end_date: 2015-06-04 would be a good fail candidate.
I've found this QA:
MySQL overlapping dates, none conflicting
that pretty much explains how to do it in 2 steps, 2 queries with some PHP logic in between.
But I was wondering if it can be done in one insert statement.
I would eventually check for rows affected for success or fail (sub question: is there a more convenient way to check if it failed for some other reason besides date overlap?)
EDIT:
In response to Petr's comment I'll specify further the validation:
any kind of overlapping should be avoided, even the one embracing the
whole range or finding itself inside the existing range. Also, if
start or end dates equal the existing start or end dates it must be
considered an overlap. Sometimes certain acc_id will already have more
than one rang in the table so the validation should be done against
all entries with a given acc_id.
Sadly, using just MySQL this is impossible. Or at least, practically. The preferred way would be using SQL CHECK constraints, these are in the SQL language standard. However, MySQL does not support them.
See: https://dev.mysql.com/doc/refman/5.7/en/create-table.html
The CHECK clause is parsed but ignored by all storage engines.
It seems PostgreSQL does support CHECK constraints on tables, but I'm not sure how viable it is for you to switch database engine or if that's even worth the trouble just to use that feature.
In MySQL a trigger could be used to solve this problem, which would check for overlapping rows before the insert/update occurs and throw an error using the SIGNAL statement. (See: https://dev.mysql.com/doc/refman/5.7/en/signal.html) However, to use this solution you'd have to use an up-to-date MySQL version.
Apart from pure SQL solutions, this typically is done in application logic, so whichever program is accessing the MySQL database typically checks for these kind of constraints by requesting every row that is violated by the new entry in a SELECT COUNT(id) ... statement. If the returned count is larger than 0 it simply doesn't to the insert/update.
I'm stress testing my database for a geolocation search system. It has a lot of optimisation built in already such a square box long/lat index system to narrow searches before performing arc distance calculations. My aim is to serve 10,000,000 users from one table.
At present my query time is between 0.1 and 0.01 seconds based on other conditions such as age, gender etc. This is for 10,000,000 users evenly distributed across the UK.
I have a LIMIT condition as I need to show the user X people, where X can be between 16 and 40.
The issue is when there are no other users / few users that match, the query can take a long time as it cannot reach the LIMIT quickly and may have to scan 400,000 rows.
There may be other optimisation techniques which I can look at but my questions is:
Is there a way to get the query to give up after X seconds? If it takes more than 1 second then it is not going to return results and I'm happy for this to occur. In pseudo query code it would be something like:
SELECT data FROM table WHERE ....... LIMIT 16 GIVEUP AFTER 1 SECOND
I have thought about a cron solution to kill slow queries but that is not very elegant. The query will be called every few seconds when in production so the cron would need to be on continuously.
Any suggestions?
Version is 10.1.14-MariaDB
Using MariaDB in version 10.1, you have two ways of limiting your query. It can be done based on time or on total of rows queried.
By rows:
SELECT ... LIMIT ROWS EXAMINED rows_limit;
You can use the keyword EXAMINED and set an amount of lines like 400000 as you mentioned (since MariaDB 10.0).
By time:
If the max_statement_time variable is set, any query (excluding stored
procedures) taking longer than the value of max_statement_time
(specified in seconds) to execute will be aborted. This can be set
globally, by session, as well as per user and per query.
If you want it for a specific query, as I imagine, you can use this:
SET STATEMENT max_statement_time=1 FOR
SELECT field1 FROM table_name ORDER BY field1;
Remember that max_statement_time is set in seconds (just the opposite of MySQL, which are milliseconds), so you can change it until you find the best fit for your case (since MariaDB 10.1).
If you need more information I recommend you this excellent post about queries timeouts.
Hope this helps you.
I am starting to think about my new project and I've found a couple of speed issues, so I hope you can help me with selecting a good and elegant way to code it.
Each user has in the database records of "places" he has visited. Each place has "schools" - a number of schools in this particular place. Each school has classes. Each class may end its "learning year" at different times, so it's number should increment if date is >= end of learning year.
So we have such a database:
"places" table:
place | user_id |
-----------------
1 | 4 |
2 | 4 |
User no 4 visited place no 1 and 2
"schools" table:
school | place |
----------------
5 | 2 |
6 | 2 |
Place 2 has two schools - with id 5 and 6.
"class" table:
class | school | end_learning | class_number
---------------------------------------------
20 | 5 | 01.01.2013 | 2
21 | 5 | 03.01.2013 | 3
22 | 5 | 05.01.2013 | 4
School 5 has 3 classes with ids 20, 21, 22. If date is greater than 01.01.2013, the class number of class 20 should be incremented to 3 and end learning date changed to 01.01.2014. And so on.
And now we got into the problem - if there is 1000 places, each with 100 schools, each with 10 classes we got 1000000 records. It's a lot. Because all I have presented is just a simple example I have to consider updating whole database every time user refreshes the page so I'm afraid it might be laggy on that amount of records.
I also can serialize class into one field in school table:
school | place | classes
-------------------------------------------------------------------------
5 | 2 | serialized class 20, 21, 22 with end_learning field and class number
6 | 2 | other serialized classes from school 6
In that case I get 10 times less records but each time I have to deserialize data, check dates and if it's less than now alter it, serialize and save to database. The second problem is that I have to select all records from db to manipulate them not only all those need to be altered.
I am also thinking about having two databases: One with records that might need change in further future, and second that might need change in next 24hrs (near future). Every 24hrs all the classes which end learning in next 24 hrs are moved to "near future" db so every refresh of the page works on thousands of records, not hundreds of thousands or millions. Instead of that it works on millions of records (further future) to create "near future" table only once per day.
What do you think about all those database schemas? Maybe you have a better idea?
I don't quite understand the business logic or data model you outline - but I will assume you have thought this through.
Firstly, RDBMS solutions like MySQL are really, really good at managing large numbers of records, as long as the data you are working with is relational. As far as I can tell, you will be searching across many records, but only updating a few (a user will only be enrolled in a limited number of classes); I don't see this as a huge problem.
Secondly, it's nearly always better to go with the "standard" relational model until you can prove it doesn't meet your performance needs than to go for "exotic" solutions at the start off (I class your serialization and partitioning solution as "exotic" for the purpose of this answer). A lot of time and energy has gone into optimizing performance of SQL; if there were a simple alternative, it would be part of the standard solution. There are, of course, points at which the standard relational model doesn't scale (Facebook-size traffic, for instance), or business domains where the relational model doesn't really fit (documents, graphs). However, all the alternatives have benefits and drawbacks just like "standard" MySQL.
Thirdly, the best way to deal with possible performance issues is, well, to deal with them. In code. Build a test rig, create a schema according to the relational model, populate it with test data (e.g. using DbMonster), throw some load at it (e.g. using JMeter) and tune your schema and queries to prove your situation doesn't fit the standard solution. Only go for something exotic if you really can prove that you can't play nice with standard, relational database stuff.
I have a question with im really unsure with.
First feel free to downvote me if its a must but i would really like to hear a more experienced developers opinion.
I am building a site where i would like to build similar functionality like google circles.
My logic would be this.
Every user will have circles attaced to them after signup.
example if the user will sign up
form filed and the following querys will be insierted to the database
**id | circle_name | user_id**
------------------------------------
1 | circle one | 1
------------------------------
2 | circle two | 1
------------------------------
3 | circle three | 1
Every circle will have a primary key
But this is what im unsure with, so after a time im a bit scared that the table will break, what im mean is if it will reach a number of id's it will actually stop generating more.
When you specifiy an int in the database the default value is 11, yes i know i can incrase or set it to the value what i want, but still giveing higher values is a good idea?
or is there any possibility to make a primary key auto increment to be unlimited?
thank you for the opinions and help outs
or is there any possibility to make a primary key auto increment to be unlimited?
You can use a BIGINT.
Strictly speaking it's not unlimited, but the range is so incredibly huge that you wouldn't be able to use up all the values even if you tried really hard.
Just run some maths and you ll get the answer yourself. If a length can store billions of values and you don't expect to have 1 million new registrations every week then getting to a point where it breaks would be "practically" tough, even if "theoretically" possible