How to search in MySQL in a 60G table containing 330M rows?

How to search in MySQL in a 60G table containing 330M rows? - php

I have a table which is 60G and has about 330M entries.
I must display this on a front-end web-app. On the web-app there is a search function which searches a string pattern in every row of the database table.
The problem is that this search takes up to 10 min and makes the MySQL process freeze. I looked for solutions but haven't found a suitable one.
In-Memory Database: database is too big (it goes up to 200 GB - 60GB is only at the moment)
Split up the table into a table for each month a put these on 6 SSDs (I need the data from half a year) then it's possible to search parallel on 6 SSD
reduce the data amount (?)
Image is here: http://i.stack.imgur.com/Q2TyD.png

If you are using an implicit cursor to search threw the db You could consider closing it after say every 50 rows then reopening it at the row you stopped it at.

You need to make use of database sharding here. It basically splits up your big database into several small databases.
Here's a quick link for you :- http://codefutures.com/database-sharding/

Related

fastest way to do 1 billion queries on millions of rows

I'm running a PHP script that searches through a relatively large MySQL instance with a table with millions of rows to find terms like "diabetes mellitus" in a column description that has a full text index on it. However, after one day I'm only through a couple hundred queries so it seems like my approach is never going to work. The entries in the description column are on average 1000 characters long.
I'm trying to figure out my next move and I have a few questions:
My MySQL table has unnecessary columns in it that aren't being queried. Will remove those affect performance?
I assume running this locally rather than on RDS will dramatically increase performance? I have a decent macbook, but I chose RDS since cost isn't an issue, and I tried to run on an instance that was better than the my Macbook.
Would using a compiled language like Go rather than PHP do more than the 5-10x boost people report in test examples? That is, given my task is there any reason to think a static language would produce 100X or more speed improvements?
Should I put the data in a text or CSV file rather than MySQL? Is using MySQL just causing unnecessary overhead?
This is the query:
SELECT id
FROM text_table
WHERE match(description) against("+diabetes +mellitus" IN BOOLEAN MODE);
Here's the line of output of EXPLAIN for the query, showing the optimizer is utilizing the FULLTEXT index:
1 SIMPLE text_table fulltext idx idx 0 NULL 1 Using where
The RDS instance is db.m4.10xlarge which has 160GB of RAM. The InnoDB buffer pool is typically about 75% of RAM on an RDS instance, which make it 120GB.
The text_table status is:
Name: text_table
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 26000630
Avg_row_length: 2118
Data_length: 55079485440
Max_data_length: 0
Index_length: 247808
Data_free: 6291456
Auto_increment: 29328‌568
Create_time: 2018-01-12 00:49:44
Update_time: NULL
Check_time: NULL
Collation: utf8_general_ci
Checksum: NULL
Create_options:
Comment:
This indicates the table has about 26 million rows, and the size of data and indexes is 51.3GB, but this doesn't include the FT index.
For the size of the FT index, query:
SELECT stat_value * ##innodb_page_size
FROM mysql.innodb_index_stats
WHERE table_name='text_table'
AND index_name = 'FTS_DOC_ID_INDEX'
AND stat_name='size'
The size of the FT index is 480247808.

Following up on comments above about concurrent queries.
If the query is taking 30 seconds to execute, then the programming language you use for the client app won't make any difference.
I'm a bit skeptical that the query is really taking 1 to 30 seconds to execute. I've tested MySQL fulltext search, and I found a search runs in under 1 second even on my laptop. See my presentation https://www.slideshare.net/billkarwin/practical-full-text-search-with-my-sql
It's possible that it's not the query that's taking so long, but it's the code you have written that submits the queries. What else is your code doing?
How are you measuring the query performance? Are you using MySQL's query profiler? See https://dev.mysql.com/doc/refman/5.7/en/show-profile.html This will help isolate how long it takes MySQL to execute the query, so you can compare to how long it takes for the rest of your PHP code to run.
Using PHP is going to be single-threaded, so you are running one query at a time, serially. The RDS instance you are using has 40 CPU cores, so you should be able to many concurrent queries at a time. But each query would need to be run by its own client.
So one idea would be to split your input search terms into at least 40 subsets, and run your PHP search code against each respective subset. MySQL should be able to run the concurrent queries fine. Perhaps there will be a slight overhead, but this will be more than compensated for by the parallel execution.
You can split your search terms manually into separate files, and then run your PHP script with each respective file as the input. That would be a straightforward way of solving this.
But to get really professional, learn to use a tool like GNU parallel to run the 40 concurrent processes and split your input over these processes automatically.

PHP: Filtering and export large amount of data from MySQL database

I have a very large database table (more than 700k records) that I need to export to a .csv file. Before exporting it, I need to check some options (provided by the user via GUI) and filter the records. Unfortunately this filtering action cannot be achieved via SQL code (for example, a column contains serialized data, so I need to unserialize and then check if the record "passes" the filtering rules.
Doing all records at once leads to memory limit issues, so I decided to break the process in chunks of 50k records. So instead of loading 700k records at once, I'm loading 50k records, apply filters, save to the .csv file, then load other 50k records and go on (until it reaches the 700k records). In this way I'm avoiding the memory issue, but it takes around 3 minutes (This time will increase if the number of records increase).
Is there any other way of doing this process (better in terms of time) without changing the database structure?
Thanks in advance!

The best thing one can do is to get PHP out of the mix as much as possible. Always the case for loading CSV, or exporting it.
In the below, I have a 26 Million row student table. I will export 200K rows of it. Granted, the column count is small in the student table. Mostly for testing other things I do with campus info for students. But you will get the idea I hope. The issue will be how long it takes for your:
... and then check if the record "passes" the filtering rules.
which naturally could occur via the db engine in theory without PHP. Without PHP should be the mantra. But that is yet to be determined. The point is, get PHP processing out of the equation. PHP is many things. An adequate partner in DB processing it is not.
select count(*) from students;
-- 26.2 million
select * from students limit 1;
+----+-------+-------+
| id | thing | camId |
+----+-------+-------+
| 1 | 1 | 14 |
+----+-------+-------+
drop table if exists xOnesToExport;
create table xOnesToExport
( id int not null
);
insert xOnesToExport (id) select id from students where id>1000000 limit 200000;
-- 200K rows, 5.1 seconds
alter table xOnesToExport ADD PRIMARY KEY(id);
-- 4.2 seconds
SELECT s.id,s.thing,s.camId INTO OUTFILE 'outStudents_20160720_0100.txt'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
FROM students s
join xOnesToExport x
on x.id=s.id;
-- 1.1 seconds
The above 1AM timestamped file with 200K rows was exported as a CSV via the join. It took 1 second.
LOAD DATA INFILE and SELECT INTO OUTFILE are companion functions that, for one one thing, cannot be beat for speed short of raw table moves. Secondly, people rarely seem to use the latter. They are flexible too if one looks into all they can do with use cases and tricks.
For Linux, use LINES TERMINATED BY '\n' ... I am on a Windows machine at the moment with the code blocks above. The only differences tend to be with paths to the file, and the line terminator.

Unless you tell it to do otherwise, php slurps your entire result set at once into RAM. It's called a buffered query. It doesn't work when your result set contains more than a few hundred rows, as you have discovered.
php's designers made it use buffered queries to make life simpler for web site developers who need to read a few rows of data and display them.
You need an unbuffered query to do what you're doing. Your php program will read and process one row at a time. But be careful to make your program read all the rows of that unbuffered result set; you can really foul things up if you leave a partial result set dangling in limbo between MySQL and your php program.
You didn't say whether you're using mysqli or PDO. Both of them offer mode settings to make your queries unbuffered. If you're using the old-skool mysql_ interface, you're probably out of luck.

100% CPU USAGE: MySQL 2,000,000 rows and query with LIKE operator

I have a MySQL table with 2,000,000 rows, my website has 40.000 to 50.000 visits per day, PHP running 150 queries per second in total, and the MySQL CPU usage is around 90%. The website is extremely slow.
Dedicated Server: AMD Opteron 8 cores, 16 GB DDR3.
Here are the MYSQL query details:
Search Example: Guns And Roses
Table Storage Engine: MyISAM
Query example:
SELECT SQL_CACHE mp3list.*, likes.* FROM mp3list
LEFT JOIN likes ON mp3list.mp3id = likes.mp3id
WHERE mp3list.active=1 AND mp3list.songname LIKE '%guns%'
AND mp3list.songname LIKE '%and%' AND mp3list.songname LIKE '%roses%'
ORDER BY likes.likes DESC LIMIT 0, 15"
Column "songname" is VARCHAR(255).
I want to know what I have to do to implement a lighter mysql search, if someone could help me, I'll be always grateful, I'm looking for a solution for weeks.
Thank you in advance.

Well, one solution would be to stop using a performance killer like like '%something%'.
One way we've done this in the past is to maintain our own lookup tables. By that I mean, put together insert, update and delete triggers which apply any changes to a table like:
word varchar(20)
id int references mp3list(id)
primary key (word,id)
Whenever you make a change to mp3list, it gets reflected to that table, which should be a lot faster to search than your current solution.
This moves the cost of figuring out what MP3s contain what words to when you update, rather than every time you select, amortising the cost. Since the vast majority of databases are read far more often than written, this can give substantial improvements. Some DBMS' provide this functionality with a full text search index (MySQL is one of these).
And you can even put some smarts in the triggers (and queries) to totally ignore noise words like a, an and the, saving both space and time, giving you more fine-grained control over what you want to store.

1.3M queries/Hour. How would you construct the queries?

I have an online iphone turnbased game, with lots of games running at the same time. I'm in the process of optimizing the code, since both me and the server have crashed today.
This is the setup:
Right now I have one table, "matches" (70 fields of data for each row. The structure), that keep track of all the active matches. Every 7 seconds, the iphone will connect, download all the matches in the "matches" table that he/she is active in, and update the UI in the iphone.
This worked great until about 1,000 people downloaded the game and played. The server crashed.
So to optimize, I figure I can create a new table called "matches_needs_update". This table have 2 rows; name and id. The "id" is the same as the match in the "matches" table. When a match is updated, it's put in this table.
Now, instead for search through the whole "matches" table, the query just check if the player have any matches that need to be updated, and then get those matches from the "matches" table.
My question is twofold:
Is this the optimal solution?
If a player is active in, say 10 matches, is there a good way to get those 10 matches from the "matches" table at the same time, or do I need a for loop doing 10 queries, one for each match:
"SELECT * FROM matches WHERE id = ?"
Thanks in advance

You need to get out of the database. Look to memcache or redis.

I suggest APC...
...as you're on PHP, and I assume you're doing this from a single mysql database,
It's easy to install, and will be default from PHP 6 onwards.
Keep this 1 table in memory and it will fly.

Your database looks really small. A table with 70 rows should return within milliseconds and even hundreds of queries per second should work without any problems.
A couple of traditional pointers
Make sure you pool your connections. You should never have to do the connect when a customer needs the data.
Make sure there is an index on "user is in match" so that the result will be fetched from the index.
I'm sure you have enough memory to hold the entire structure in the cache and with these small tables no additional config should be needed.
Make sure your schema is normalized. One table for each user. One for each match. And one for each user in a match.

Its time to start caching things eg memcache and apc.
As for looping though the matches... that is the wrong way to go about it.
How is a user connected to a match by a xref tabel? or does the match table have somthing like player1,player2.
Looping though queries is not the way to go properly indexing your tables and doing a join to pull all the active matches by a userId would me more efficient. Givin the number of users you may also want to (if you havent) split the tables up for active and inactive games.
If theres 6000 active games and 3,000,000 inactive its extremely beneficial to partition these tables.

Large mysql query in PHP

I have a large table of about 14 million rows. Each row has contains a block of text. I also have another table with about 6000 rows and each row has a word and six numerical values for each word. I need to take each block of text from the first table and find the amount of times each word in the second table appears then calculate the mean of the six values for each block of text and store it.
I have a debian machine with an i7 and 8gb of memory which should be able to handle it. At the moment I am using the php substr_count() function. However PHP just doesn't feel like its the right solution for this problem. Other than working around time-out and memory limit problems does anyone have a better way of doing this? Is it possible to use just SQL? If not what would be the best way to execute my PHP without overloading the server?

Do each record from the 'big' table one-at-a-time. Load that single 'block' of text into your program (php or what ever), and do the searching and calculation, then save the appropriate values where ever you need them.
Do each record as its own transaction, in isolation from the rest. If you are interrupted, use the saved values to determine where to start again.
Once you are done the existing records, you only need to do this in the future when you enter or update a record, so it's much easier. You just need to take your big bite right now to get the data updated.

What are you trying to do exactly? If you are trying to create something like a search engine with a weighting function, you maybe should drop that and instead use the MySQL fulltext search functions and indices that are there. If you still need to have this specific solution, you can of course do this completely in SQL. You can do this in one query or with a trigger that is run each time after a row is inserted or updated. You wont be able to get this done properly with PHP without jumping through a lot of hoops.
To give you a specific answer, we indeed would need more information about the queries, data structures and what you are trying to do.

Redesign IT()
If for size on disc is not !important just joints table into one
Table with 6000 put into memory [ memory table ] and make backup every one hour
INSERT IGNORE into back.table SELECT * FROM my.table;
Create "own" index in big table eq
Add column "name index" into big table with id of row
--
Need more info about query to find solution

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.