How to manage databases with limited amounts of data

How to manage databases with limited amounts of data - php

Hi I am building a social network in dreamweaver using php and sql as my server languages to interact with my databases. I am going to use godaddy.com to host my website and they say that they will give me unlimited mysql databases, but they can only be 1gb each. I would like to have one database designated for just user information like name and email that would be contained in one huge table. Then in database 2, I would like to give each user their own table that contains all of their comments. Every comment I would just add a row of data. Pretty soon I would run out of space on database 2 and have to create a database 3 full of comments. I would continue this process of creating a new database everytime I ran out of data on the old one. The problem is that people on database 2 are still making comments and are still creating more data for me to store. I don't want to put a limit on how many comments people can store. I want them to be able to create as many comments as they want without deleting the old comments. Any suggestions on what to do or where to go from here. How can I solve this problem? Also, is there a way to find out how much storage a database has left through code.

You can run the following sql statement to determine the database size in MB.
SELECT table_schema "Data Base Name", SUM( data_length + index_length) / 1024 / 1024
"Data Base Size in MB" FROM information_schema.TABLES
where table_schema='apdb'
GROUP BY table_schema ;
+----------------+----------------------+
| Data Base Name | Data Base Size in MB |
+----------------+----------------------+
| apdb | 15.02329159 |
+----------------+----------------------+
1 row in set (0.00 sec)
In the above example, apdb is the name of the database.

I think that 1Gb of data should be more than enough to start with for your social network. And if your network grows really really really big you can always move your application elsewhere.
Let's make the calculation:
say: 10.000 users to start with (this seems low compared to Facebook, but it will take you a long time to get 10.000 users to sign up).
10.000 x 500(?) bytes of information = 5Mb of data
each user makes 100 comments. The average size of a comment is 100 bytes. This also presumes an active community.
10.000 x 100 x 100 = 100Mb of data
You're still well within your 1Gb database limit.
As soon as you hit the 1Gb: change hosting provider, or start paying...

Related

Wordpress & MySQL table fragmentation

So, I'm using Wordpress, MySQL (8.0.16), InnoDB.
The wp_options table normally is 13 MB. The problem is, it suddenly (at least within a span of a few days) becomes 27 GB and then stops growing, because there's no more space available. Those 27 GB are considered data, not indexes.
Dumping and importing the table gives you a table of the normal size. The number of entries is around 4k, the autoincrement index is 200k+. Defragmenting table with ALTER TABLE wp_options ENGINE = InnoDB; changes the table size on disk to normal, but mysql thinks otherwise, even after the server restart.
+------------+------------+
| Table | Size in MB |
+------------+------------+
| wp_options | 26992.56 |
+------------+------------+
1 row in set (0.00 sec)
MySQL logs don't say much:
2019-08-05T17:02:41.939945Z 1110933 [ERROR] [MY-012144] [InnoDB] posix_fallocate(): Failed to preallocate data for file ./XXX/wp_options.ibd, desired size 4194304 bytes. Operating system error number 28. Check that the disk is not full or a disk quota exceeded. Make sure the file system supports this function. Some operating system error numbers are described at http://dev.mysql.com/doc/refman/8.0/en/operating-system-error-codes.html
2019-08-05T17:02:41.941604Z 1110933 [Warning] [MY-012637] [InnoDB] 1048576 bytes should have been written. Only 774144 bytes written. Retrying for the remaining bytes.
2019-08-05T17:02:41.941639Z 1110933 [Warning] [MY-012638] [InnoDB] Retry attempts for writing partial data failed.
2019-08-05T17:02:41.941655Z 1110933 [ERROR] [MY-012639] [InnoDB] Write to file ./XXX/wp_options.ibd failed at offset 28917628928, 1048576 bytes should have been written, only 774144 were written. Operating system error number 28. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
2019-08-05T17:02:41.941673Z 1110933 [ERROR] [MY-012640] [InnoDB] Error number 28 means 'No space left on device'
My guess is that something starts adding options (something transient-related, maybe?) and never stops.
The question is, how to debug it? Any help/hints would be appreciated.
Hourly Cron to defragment looks like a very bad solution.
UPD:
1 day had passed, free disk space decreased by 7 GB. Current autoincrement index is 206975 (and it was 202517 yesterday when there were 27 GB free). So 4.5K entries = 7 GB, I guess?
mysql> SELECT table_name AS `Table`, round(((data_length + index_length) / 1024 / 1024), 2) `Size in MB` FROM information_schema.TABLES WHERE table_schema = 'XXX' AND table_name = 'wp_options';
+------------+------------+
| Table | Size in MB |
+------------+------------+
| wp_options | 7085.52 |
+------------+------------+
1 row in set (0.00 sec)
mysql> select ENGINE, TABLE_NAME,Round( DATA_LENGTH/1024/1024) as data_length , round(INDEX_LENGTH/1024/1024) as index_length, round(DATA_FREE/ 1024/1024) as data_free from information_schema.tables where DATA_FREE > 0 and TABLE_NAME = "wp_options" limit 0, 10;
+--------+------------+-------------+--------------+-----------+
| ENGINE | TABLE_NAME | data_length | index_length | data_free |
+--------+------------+-------------+--------------+-----------+
| InnoDB | wp_options | 7085 | 0 | 5 |
+--------+------------+-------------+--------------+-----------+
I will monitor the dynamics of how free space decreases, maybe that would shed some more light on the problem.
UPD (final)
I had a feeling it was something stupid, and I was right. There was a flush_rewrite_rules(); of all things unholy right in the functions.php. Examining the general log was helpful.

One possibility is that you're seeing incorrect statistics about the table size.
MySQL 8.0 tries to cache the statistics about tables, but there seem to be some bugs in the implementation. Sometimes it shows table statistics as NULL, and sometimes it shows values, but fails to update them as you modify table data.
See https://bugs.mysql.com/bug.php?id=83957 for example, a bug that discusses the problems with this caching behavior.
You can disable the caching. It may cause queries against the INFORMATION_SCHEMA or SHOW TABLE STATUS to be a little bit slower, but I would guess it's no worse than in versions of MySQL before 8.0.
SET GLOBAL information_schema_stats_expiry = 0;
The integer value is the number of seconds MySQL keeps statistics cached. If you query the table stats, you may see old values from the cache, until they expire and MySQL refreshes them by reading from the storage engine.
The default value for the cache expiration is 86400, or 24 hours. That seems excessive.
See https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_information_schema_stats_expiry
If you think Wordpress is writing to the table, then it might be. You can enable the binary log or the query log to find out. Or just observe SHOW PROCESSLIST for a few minutes.
You might have a wordpress plugin that is updating or inserting into a table frequently. You can look for the latest update_time:
SELECT * FROM INFORMATION_SCHEMA.TABLES
ORDER BY UPDATE_TIME DESC LIMIT 3;
Watch this to find out which tables are written to most recently.
There are caveats to this UPDATE_TIME stat. It isn't always in sync with the queries that updated the table, because writes to tablespace files are asynchronous. Read about it here: https://dev.mysql.com/doc/refman/8.0/en/tables-table.html

Have you tried with slow log? That might give you some hint where all the queries come from.

Do you have a staging site, and is the problem replicated there? If so, then turn off all plugins (as long as turning off the plugin doesn't break your site) on your staging site to see if the problem stops. If it does, then turn them on again one at a time to find out which plugin is causing the problem.
If you don't have a staging site, then you can try doing this on live, but keep in mind you will be messing with your live functionality (generally not recommended). In this case, just remove the plugins you absolutely do not need, and hopefully one of them is the culprit. If this works, then add them one at a time again until you find the plugin causing the problem.
My suggestion above assumes a plugin is causing this problem. Out of the box WordPress doesn't do this (not that I've heard of). It's possible there is custom programming doing this, but you would need to let us know more details concerning any recent scripts.
The biggest problem at this point is that you don't know what the problem is. Until you do, it's hard to take corrective measures.

Some plugins fail to clean up after themselves. Chase down what plugin(s) you added about the time the problem started. Look in the rows in options to see if there are clues that further implicate specific plugins.
No, nothing is MySQL's statistics, etc, can explain a 27GB 'error' in the calculations. No amount of OPTIMIZE TABLE, etc will fix more than a fraction of that. You need to delete most of the rows. Show us some of the recent rows (high AUTO_INCREMENT ids).

If the table has frequent delete/update/insert, you can run OPTIMIZE TABLE yourTable;
It needs to be run in maintenance window.
It will free spaces for reuse, but disk space will not be decreased.

Locking MySql table or data to prevent getting same data

I have a database with 1 million entries and 3 programs that process something with the data. The 3 programs get the data over an api call. For example 100 entries for each request. What is the best way to prevent that the programs get the same 100 entries?
I tried to update an id per program to database, but that don't solve the problem. Because if the 3 programs request data and update from one is still running it can be that other program get same data.
I have tried LOCK TABLES but it is the maintable in my databse. So all other processes from php and so also slow down extremly. Because table is totaly locked every few minutes.

How about an app_id column.
App 1 does this...
UPDATE table SET app_id=3 where app_id=NULL LIMIT 100
SELECT * FROM table WHERE app_id=1 LIMIT 100
----process and when done----
UPDATE table SET app_id=NULL where app_id=1 LIMIT 100
App 2 does this...
UPDATE table SET app_id=2 where app_id=NULL LIMIT 100
SELECT * FROM table WHERE app_id=2 LIMIT 100
----process and when done----
UPDATE table SET app_id=NULL where app_id=2 LIMIT 100
You can have unlimited apps and they should only get their own records. This kinda hits your DB harder. Maybe you could use a combo of memcache/stored procedures to limit db load depending on your architecture.
Another option might be able to handle this with the API. You could create a system wide global variable and store what app has what records. Anytime the API is called, it will look at the system wide variable to know if it can obtain data or from what record to start with. That might be easier on the DB.

How to select different records for ten site in large mysql databases

I have a large database of three million articles in a specific category.I'm going with this database, few sites launch.but my budget is low.So the best thing is for me to use a shared host but the problem is that the shared host hardware power is weak given to the user because it shared so I have to get a new post to a site that has already been posted i'm in trouble. I used the following method to get the new contents of the database but now with the increasing number and growing database records more than the power of a shared host to display information at the right time.
My previous method :
I have a table for content
And a table to know what entry was posted statistics that for every site.
My query is included below:
SELECT * FROM postlink WHERE `source`='$mysource' AND NOT EXISTS (SELECT sign FROM `state` WHERE postlink.sign = state.sign AND `cite`='$mycite') ORDER BY `postlink`.`id` ASC LIMIT 5
i use mysql
I've tested with different queries but did not get a good result and we had to show a few post more very time-consuming.
Now I want you to help me and offer me a solution thats I can with the number of posts and with normally shared host show in the shortest possible time some new content to the site requesting new posts.
The problem will happen when the sending post stats table is too large and if I want to empty this table we'll be in problems with sending duplicate content so I have no other choice to table statistics.
Statistics table now has a record 500 thousand entries for 10 sites.
thanks all in advance

Are you seriously calling 3 million articles a large database? PostgreSQL will not even start making Toasts at this point.
Consider migrating to a more serious database where you can use partial indexes, table partitioning, materialized views, etc.

PHP: Filtering and export large amount of data from MySQL database

I have a very large database table (more than 700k records) that I need to export to a .csv file. Before exporting it, I need to check some options (provided by the user via GUI) and filter the records. Unfortunately this filtering action cannot be achieved via SQL code (for example, a column contains serialized data, so I need to unserialize and then check if the record "passes" the filtering rules.
Doing all records at once leads to memory limit issues, so I decided to break the process in chunks of 50k records. So instead of loading 700k records at once, I'm loading 50k records, apply filters, save to the .csv file, then load other 50k records and go on (until it reaches the 700k records). In this way I'm avoiding the memory issue, but it takes around 3 minutes (This time will increase if the number of records increase).
Is there any other way of doing this process (better in terms of time) without changing the database structure?
Thanks in advance!

The best thing one can do is to get PHP out of the mix as much as possible. Always the case for loading CSV, or exporting it.
In the below, I have a 26 Million row student table. I will export 200K rows of it. Granted, the column count is small in the student table. Mostly for testing other things I do with campus info for students. But you will get the idea I hope. The issue will be how long it takes for your:
... and then check if the record "passes" the filtering rules.
which naturally could occur via the db engine in theory without PHP. Without PHP should be the mantra. But that is yet to be determined. The point is, get PHP processing out of the equation. PHP is many things. An adequate partner in DB processing it is not.
select count(*) from students;
-- 26.2 million
select * from students limit 1;
+----+-------+-------+
| id | thing | camId |
+----+-------+-------+
| 1 | 1 | 14 |
+----+-------+-------+
drop table if exists xOnesToExport;
create table xOnesToExport
( id int not null
);
insert xOnesToExport (id) select id from students where id>1000000 limit 200000;
-- 200K rows, 5.1 seconds
alter table xOnesToExport ADD PRIMARY KEY(id);
-- 4.2 seconds
SELECT s.id,s.thing,s.camId INTO OUTFILE 'outStudents_20160720_0100.txt'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
FROM students s
join xOnesToExport x
on x.id=s.id;
-- 1.1 seconds
The above 1AM timestamped file with 200K rows was exported as a CSV via the join. It took 1 second.
LOAD DATA INFILE and SELECT INTO OUTFILE are companion functions that, for one one thing, cannot be beat for speed short of raw table moves. Secondly, people rarely seem to use the latter. They are flexible too if one looks into all they can do with use cases and tricks.
For Linux, use LINES TERMINATED BY '\n' ... I am on a Windows machine at the moment with the code blocks above. The only differences tend to be with paths to the file, and the line terminator.

Unless you tell it to do otherwise, php slurps your entire result set at once into RAM. It's called a buffered query. It doesn't work when your result set contains more than a few hundred rows, as you have discovered.
php's designers made it use buffered queries to make life simpler for web site developers who need to read a few rows of data and display them.
You need an unbuffered query to do what you're doing. Your php program will read and process one row at a time. But be careful to make your program read all the rows of that unbuffered result set; you can really foul things up if you leave a partial result set dangling in limbo between MySQL and your php program.
You didn't say whether you're using mysqli or PDO. Both of them offer mode settings to make your queries unbuffered. If you're using the old-skool mysql_ interface, you're probably out of luck.

Import big file into mysql, on a Heroku app

I need some help.
I have an php app on Heroku. In this app, there's a form that upload an csv file, to be imported on Mysql(cleardb).
The problem it's, that the file it's large (will always be large), and the function it's taking too much time to finish (about 90 seconds). The timeout on heroku it's 30 seconds, and there's no way to change that.
I tried to use Heroku Scheduler (like cron), but the minimal frequency it's 10 minutes, and a script that can take 90 seconds, using this scheduler, will take 30 minutes, because as i said, the timeout of heroku it's 30 seconds.
Well, what can i do? there's an alternative scheduler?
Example of the import:
CSV
name,productName,points,categoryName,coordName,date
MYSQL
[users]
userID
userName
categoryID
coordID
[products]
productID
productName
[coords]
coordID
coordName
[categories]
categoryID
categoryName
[points]
pointID
productID
userID
value
in all tables, i need to make a select to see if the category, coord, etc, already exists. If exists, return id, if not, insert a new line.
I dont think that there's a way to decrease time execution time. I'm trying to find a way to decrease the schedule to 2 minutes, 3 minutes, etc. So, in about 10 minutes, all lines will be imported.
thanks!

This is what I would start with (because it's relatively simple/quick to implement and should give you a reference point and some wiggle room for further tests in a short period of time):
Import all the data as-is into a temporary table (if the server's RAM allow you can also try the memory engine).
Then, after the data has been imported, create the indices needed for the following queries (and check via EXPLAIN or any other tool that shows you if and how the indices are used):
query all the categories that are in the temporary table but not in your live data tables
create those categories in the live tables.
query all coords that are in the temporary table but not in your live data tables.
create those coords in the live tables.
you get the idea ...repeat for all necessary data.
then just import the data from the temp table into the live tables via INSERT...SELECT queries. Think about what kind of transaction/locking you will need for this. It might be that the order of queries will make a difference. But if you're only adding data, I assume that a rather low isolation level should do... not sure though. But maybe that's not your concern right now?

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.