Wordpress & MySQL table fragmentation

Wordpress & MySQL table fragmentation - php

So, I'm using Wordpress, MySQL (8.0.16), InnoDB.
The wp_options table normally is 13 MB. The problem is, it suddenly (at least within a span of a few days) becomes 27 GB and then stops growing, because there's no more space available. Those 27 GB are considered data, not indexes.
Dumping and importing the table gives you a table of the normal size. The number of entries is around 4k, the autoincrement index is 200k+. Defragmenting table with ALTER TABLE wp_options ENGINE = InnoDB; changes the table size on disk to normal, but mysql thinks otherwise, even after the server restart.
+------------+------------+
| Table | Size in MB |
+------------+------------+
| wp_options | 26992.56 |
+------------+------------+
1 row in set (0.00 sec)
MySQL logs don't say much:
2019-08-05T17:02:41.939945Z 1110933 [ERROR] [MY-012144] [InnoDB] posix_fallocate(): Failed to preallocate data for file ./XXX/wp_options.ibd, desired size 4194304 bytes. Operating system error number 28. Check that the disk is not full or a disk quota exceeded. Make sure the file system supports this function. Some operating system error numbers are described at http://dev.mysql.com/doc/refman/8.0/en/operating-system-error-codes.html
2019-08-05T17:02:41.941604Z 1110933 [Warning] [MY-012637] [InnoDB] 1048576 bytes should have been written. Only 774144 bytes written. Retrying for the remaining bytes.
2019-08-05T17:02:41.941639Z 1110933 [Warning] [MY-012638] [InnoDB] Retry attempts for writing partial data failed.
2019-08-05T17:02:41.941655Z 1110933 [ERROR] [MY-012639] [InnoDB] Write to file ./XXX/wp_options.ibd failed at offset 28917628928, 1048576 bytes should have been written, only 774144 were written. Operating system error number 28. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
2019-08-05T17:02:41.941673Z 1110933 [ERROR] [MY-012640] [InnoDB] Error number 28 means 'No space left on device'
My guess is that something starts adding options (something transient-related, maybe?) and never stops.
The question is, how to debug it? Any help/hints would be appreciated.
Hourly Cron to defragment looks like a very bad solution.
UPD:
1 day had passed, free disk space decreased by 7 GB. Current autoincrement index is 206975 (and it was 202517 yesterday when there were 27 GB free). So 4.5K entries = 7 GB, I guess?
mysql> SELECT table_name AS `Table`, round(((data_length + index_length) / 1024 / 1024), 2) `Size in MB` FROM information_schema.TABLES WHERE table_schema = 'XXX' AND table_name = 'wp_options';
+------------+------------+
| Table | Size in MB |
+------------+------------+
| wp_options | 7085.52 |
+------------+------------+
1 row in set (0.00 sec)
mysql> select ENGINE, TABLE_NAME,Round( DATA_LENGTH/1024/1024) as data_length , round(INDEX_LENGTH/1024/1024) as index_length, round(DATA_FREE/ 1024/1024) as data_free from information_schema.tables where DATA_FREE > 0 and TABLE_NAME = "wp_options" limit 0, 10;
+--------+------------+-------------+--------------+-----------+
| ENGINE | TABLE_NAME | data_length | index_length | data_free |
+--------+------------+-------------+--------------+-----------+
| InnoDB | wp_options | 7085 | 0 | 5 |
+--------+------------+-------------+--------------+-----------+
I will monitor the dynamics of how free space decreases, maybe that would shed some more light on the problem.
UPD (final)
I had a feeling it was something stupid, and I was right. There was a flush_rewrite_rules(); of all things unholy right in the functions.php. Examining the general log was helpful.

One possibility is that you're seeing incorrect statistics about the table size.
MySQL 8.0 tries to cache the statistics about tables, but there seem to be some bugs in the implementation. Sometimes it shows table statistics as NULL, and sometimes it shows values, but fails to update them as you modify table data.
See https://bugs.mysql.com/bug.php?id=83957 for example, a bug that discusses the problems with this caching behavior.
You can disable the caching. It may cause queries against the INFORMATION_SCHEMA or SHOW TABLE STATUS to be a little bit slower, but I would guess it's no worse than in versions of MySQL before 8.0.
SET GLOBAL information_schema_stats_expiry = 0;
The integer value is the number of seconds MySQL keeps statistics cached. If you query the table stats, you may see old values from the cache, until they expire and MySQL refreshes them by reading from the storage engine.
The default value for the cache expiration is 86400, or 24 hours. That seems excessive.
See https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_information_schema_stats_expiry
If you think Wordpress is writing to the table, then it might be. You can enable the binary log or the query log to find out. Or just observe SHOW PROCESSLIST for a few minutes.
You might have a wordpress plugin that is updating or inserting into a table frequently. You can look for the latest update_time:
SELECT * FROM INFORMATION_SCHEMA.TABLES
ORDER BY UPDATE_TIME DESC LIMIT 3;
Watch this to find out which tables are written to most recently.
There are caveats to this UPDATE_TIME stat. It isn't always in sync with the queries that updated the table, because writes to tablespace files are asynchronous. Read about it here: https://dev.mysql.com/doc/refman/8.0/en/tables-table.html

Have you tried with slow log? That might give you some hint where all the queries come from.

Do you have a staging site, and is the problem replicated there? If so, then turn off all plugins (as long as turning off the plugin doesn't break your site) on your staging site to see if the problem stops. If it does, then turn them on again one at a time to find out which plugin is causing the problem.
If you don't have a staging site, then you can try doing this on live, but keep in mind you will be messing with your live functionality (generally not recommended). In this case, just remove the plugins you absolutely do not need, and hopefully one of them is the culprit. If this works, then add them one at a time again until you find the plugin causing the problem.
My suggestion above assumes a plugin is causing this problem. Out of the box WordPress doesn't do this (not that I've heard of). It's possible there is custom programming doing this, but you would need to let us know more details concerning any recent scripts.
The biggest problem at this point is that you don't know what the problem is. Until you do, it's hard to take corrective measures.

Some plugins fail to clean up after themselves. Chase down what plugin(s) you added about the time the problem started. Look in the rows in options to see if there are clues that further implicate specific plugins.
No, nothing is MySQL's statistics, etc, can explain a 27GB 'error' in the calculations. No amount of OPTIMIZE TABLE, etc will fix more than a fraction of that. You need to delete most of the rows. Show us some of the recent rows (high AUTO_INCREMENT ids).

If the table has frequent delete/update/insert, you can run OPTIMIZE TABLE yourTable;
It needs to be run in maintenance window.
It will free spaces for reuse, but disk space will not be decreased.

Related

fastest way to do 1 billion queries on millions of rows

I'm running a PHP script that searches through a relatively large MySQL instance with a table with millions of rows to find terms like "diabetes mellitus" in a column description that has a full text index on it. However, after one day I'm only through a couple hundred queries so it seems like my approach is never going to work. The entries in the description column are on average 1000 characters long.
I'm trying to figure out my next move and I have a few questions:
My MySQL table has unnecessary columns in it that aren't being queried. Will remove those affect performance?
I assume running this locally rather than on RDS will dramatically increase performance? I have a decent macbook, but I chose RDS since cost isn't an issue, and I tried to run on an instance that was better than the my Macbook.
Would using a compiled language like Go rather than PHP do more than the 5-10x boost people report in test examples? That is, given my task is there any reason to think a static language would produce 100X or more speed improvements?
Should I put the data in a text or CSV file rather than MySQL? Is using MySQL just causing unnecessary overhead?
This is the query:
SELECT id
FROM text_table
WHERE match(description) against("+diabetes +mellitus" IN BOOLEAN MODE);
Here's the line of output of EXPLAIN for the query, showing the optimizer is utilizing the FULLTEXT index:
1 SIMPLE text_table fulltext idx idx 0 NULL 1 Using where
The RDS instance is db.m4.10xlarge which has 160GB of RAM. The InnoDB buffer pool is typically about 75% of RAM on an RDS instance, which make it 120GB.
The text_table status is:
Name: text_table
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 26000630
Avg_row_length: 2118
Data_length: 55079485440
Max_data_length: 0
Index_length: 247808
Data_free: 6291456
Auto_increment: 29328‌568
Create_time: 2018-01-12 00:49:44
Update_time: NULL
Check_time: NULL
Collation: utf8_general_ci
Checksum: NULL
Create_options:
Comment:
This indicates the table has about 26 million rows, and the size of data and indexes is 51.3GB, but this doesn't include the FT index.
For the size of the FT index, query:
SELECT stat_value * ##innodb_page_size
FROM mysql.innodb_index_stats
WHERE table_name='text_table'
AND index_name = 'FTS_DOC_ID_INDEX'
AND stat_name='size'
The size of the FT index is 480247808.

Following up on comments above about concurrent queries.
If the query is taking 30 seconds to execute, then the programming language you use for the client app won't make any difference.
I'm a bit skeptical that the query is really taking 1 to 30 seconds to execute. I've tested MySQL fulltext search, and I found a search runs in under 1 second even on my laptop. See my presentation https://www.slideshare.net/billkarwin/practical-full-text-search-with-my-sql
It's possible that it's not the query that's taking so long, but it's the code you have written that submits the queries. What else is your code doing?
How are you measuring the query performance? Are you using MySQL's query profiler? See https://dev.mysql.com/doc/refman/5.7/en/show-profile.html This will help isolate how long it takes MySQL to execute the query, so you can compare to how long it takes for the rest of your PHP code to run.
Using PHP is going to be single-threaded, so you are running one query at a time, serially. The RDS instance you are using has 40 CPU cores, so you should be able to many concurrent queries at a time. But each query would need to be run by its own client.
So one idea would be to split your input search terms into at least 40 subsets, and run your PHP search code against each respective subset. MySQL should be able to run the concurrent queries fine. Perhaps there will be a slight overhead, but this will be more than compensated for by the parallel execution.
You can split your search terms manually into separate files, and then run your PHP script with each respective file as the input. That would be a straightforward way of solving this.
But to get really professional, learn to use a tool like GNU parallel to run the 40 concurrent processes and split your input over these processes automatically.

MySQL insert/update/delete/select goes slow sometimes

As I am using InnoDB as a database engine, the query goes slower sometimes it takes 20 seconds or more often.
I know the solution it can be done via my.conf to change the value of innodb_flush_log_at_trx_commit to 2 it can solve my problem I also want to do that but as I have shared hosting so they are not allowing me to do that.
MySQL version: 5.6.32-78.1
I also tried with MySQL query
mysql> SHOW VARIABLES LIKE 'innodb_flush_log_at_trx_commit';
+--------------------------------+-------+
| Variable_name | Value |
+--------------------------------+-------+
| innodb_flush_log_at_trx_commit | 1 |
+--------------------------------+-------+
1 row in set
And i tried this query
mysql> SET GLOBAL innodb_flush_log_at_trx_commit=2;
but it is also not allowing me too because I don't have super privileges to perform this action.
I have database with 25 tables, and in 4 tables there are 4000+ records and in rest tables, there are below 100 records
So is there any other solution to speed up the query performance.? Any help will be appreciated.

Use profile to check the cost time in each step;
set profiling=1;
Run you query;
Check the query:show profiles;
List the time cost: show profile block io,cpu for query N;
Find the step which has the high Duration
It shows like this
Possible problem: Index, Order by, File sort, Use temp table..

PHP: Filtering and export large amount of data from MySQL database

I have a very large database table (more than 700k records) that I need to export to a .csv file. Before exporting it, I need to check some options (provided by the user via GUI) and filter the records. Unfortunately this filtering action cannot be achieved via SQL code (for example, a column contains serialized data, so I need to unserialize and then check if the record "passes" the filtering rules.
Doing all records at once leads to memory limit issues, so I decided to break the process in chunks of 50k records. So instead of loading 700k records at once, I'm loading 50k records, apply filters, save to the .csv file, then load other 50k records and go on (until it reaches the 700k records). In this way I'm avoiding the memory issue, but it takes around 3 minutes (This time will increase if the number of records increase).
Is there any other way of doing this process (better in terms of time) without changing the database structure?
Thanks in advance!

The best thing one can do is to get PHP out of the mix as much as possible. Always the case for loading CSV, or exporting it.
In the below, I have a 26 Million row student table. I will export 200K rows of it. Granted, the column count is small in the student table. Mostly for testing other things I do with campus info for students. But you will get the idea I hope. The issue will be how long it takes for your:
... and then check if the record "passes" the filtering rules.
which naturally could occur via the db engine in theory without PHP. Without PHP should be the mantra. But that is yet to be determined. The point is, get PHP processing out of the equation. PHP is many things. An adequate partner in DB processing it is not.
select count(*) from students;
-- 26.2 million
select * from students limit 1;
+----+-------+-------+
| id | thing | camId |
+----+-------+-------+
| 1 | 1 | 14 |
+----+-------+-------+
drop table if exists xOnesToExport;
create table xOnesToExport
( id int not null
);
insert xOnesToExport (id) select id from students where id>1000000 limit 200000;
-- 200K rows, 5.1 seconds
alter table xOnesToExport ADD PRIMARY KEY(id);
-- 4.2 seconds
SELECT s.id,s.thing,s.camId INTO OUTFILE 'outStudents_20160720_0100.txt'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
FROM students s
join xOnesToExport x
on x.id=s.id;
-- 1.1 seconds
The above 1AM timestamped file with 200K rows was exported as a CSV via the join. It took 1 second.
LOAD DATA INFILE and SELECT INTO OUTFILE are companion functions that, for one one thing, cannot be beat for speed short of raw table moves. Secondly, people rarely seem to use the latter. They are flexible too if one looks into all they can do with use cases and tricks.
For Linux, use LINES TERMINATED BY '\n' ... I am on a Windows machine at the moment with the code blocks above. The only differences tend to be with paths to the file, and the line terminator.

Unless you tell it to do otherwise, php slurps your entire result set at once into RAM. It's called a buffered query. It doesn't work when your result set contains more than a few hundred rows, as you have discovered.
php's designers made it use buffered queries to make life simpler for web site developers who need to read a few rows of data and display them.
You need an unbuffered query to do what you're doing. Your php program will read and process one row at a time. But be careful to make your program read all the rows of that unbuffered result set; you can really foul things up if you leave a partial result set dangling in limbo between MySQL and your php program.
You didn't say whether you're using mysqli or PDO. Both of them offer mode settings to make your queries unbuffered. If you're using the old-skool mysql_ interface, you're probably out of luck.

mysql table rows limit?

I'm new to mysql & I have coded my first php-mysql application.
I have 30 devices list from which I want user to add their preferred devices into his account.
So, I created mysql table "A" where some device-id's are being stored under particular userID like this
UserID | Device Id
1 | 33
1 | 21
1 | 52
2 | 12
2 | 45
3 | 22
3 | 08
1 | 5
more.....
Say, I have 5000 user-ids & 30 devices-ids.
& If each user has 15 device-id's (on-average) under his account records.
Then it will be 5000 X 15 = 75000 records under Table "A"
So, My question, Is there any limit on how many records can we store in to mysql table ?
Is my approach for storing records as mentioned above is correct ?? whether it will affect query performance if more users are added ?
OR there is any better way to do that ?

It's very unlikely that you will approach the limitations of a MySQL table with two columns that are only integers.
If you're really concerned about query performance, you can just go ahead and throw an index on both columns. It's likely that the cost of inserting / updating your table will be negligible even with an index on the device ID. If your database gets huge, it can speed up queries such as "which users prefer this device". Your queries that ask "what devices do this user prefer" will also be fast with an index on user.
I would just say to make this table a simple two column table with a two-part composite key (indexed). This way, it will be as atomic as possible and won't require any of the tom-foolery that some may suggest to "increase performance."
Keep it atomic and normal -- your performance will be fine and you won't exceed any limitations of your DBMS

I don't see anything wrong in it. If you are looking forward to save some server space, then don't worry about it. Let your database do the underlying job. Index your database properly with an ID - int(10) primary auto increment . Think about scalability when it is needed. Your first target should be to complete the application that you are making. Then test it. If you find that it is causing any lag, problem, then start worrying about the things to solve the problem. Don't bother yourself with things that probably you might not even face.
But considering the scale of your application (75k to 1 lac records), it shouldn't be much of a task. Alternatively you can have a schema like this for your users
(device_table)
device_id
23
45
56
user_id | device_id
1 | 23,45,67,45,23
2 | 45,67,23,45
That is storing device_ids in an array and then getting the device_id for particular user as
$device_for_user=explode(',',$device_id)
Where of course device_id is retrieved from mysql database.
so you'll have
$device_for_user[0]=23
$device_for_user[1]=45
amd so on.
But this method isn't a very good design or an approach. But just for your information, this is one way of doing it

How to manage databases with limited amounts of data

Hi I am building a social network in dreamweaver using php and sql as my server languages to interact with my databases. I am going to use godaddy.com to host my website and they say that they will give me unlimited mysql databases, but they can only be 1gb each. I would like to have one database designated for just user information like name and email that would be contained in one huge table. Then in database 2, I would like to give each user their own table that contains all of their comments. Every comment I would just add a row of data. Pretty soon I would run out of space on database 2 and have to create a database 3 full of comments. I would continue this process of creating a new database everytime I ran out of data on the old one. The problem is that people on database 2 are still making comments and are still creating more data for me to store. I don't want to put a limit on how many comments people can store. I want them to be able to create as many comments as they want without deleting the old comments. Any suggestions on what to do or where to go from here. How can I solve this problem? Also, is there a way to find out how much storage a database has left through code.

You can run the following sql statement to determine the database size in MB.
SELECT table_schema "Data Base Name", SUM( data_length + index_length) / 1024 / 1024
"Data Base Size in MB" FROM information_schema.TABLES
where table_schema='apdb'
GROUP BY table_schema ;
+----------------+----------------------+
| Data Base Name | Data Base Size in MB |
+----------------+----------------------+
| apdb | 15.02329159 |
+----------------+----------------------+
1 row in set (0.00 sec)
In the above example, apdb is the name of the database.

I think that 1Gb of data should be more than enough to start with for your social network. And if your network grows really really really big you can always move your application elsewhere.
Let's make the calculation:
say: 10.000 users to start with (this seems low compared to Facebook, but it will take you a long time to get 10.000 users to sign up).
10.000 x 500(?) bytes of information = 5Mb of data
each user makes 100 comments. The average size of a comment is 100 bytes. This also presumes an active community.
10.000 x 100 x 100 = 100Mb of data
You're still well within your 1Gb database limit.
As soon as you hit the 1Gb: change hosting provider, or start paying...

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.