I'm super aware of all the questions with pretty much this exact same name on here, but none of their solutions seemed to be the answer to my problems.
The query I'm using isn't very big, and I definitely have my packet size settings all configured correctly (never had a problem like this and some of my queries are properly large, much larger than the query in question).
I'm using prepared statements to pass some data to a fulltext search, and only when I seem to use it this way, I get this error. If I take the text out and paste it in as part of the query instead of preparing it, it works fine.
Also, in the MySQL log I get a huge error that starts like this
21:31:08 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.
The query is this:
insert into`unpairedbillsuggestions`(`UnpairedBillSuggestionID`,
`ShippingBillID`,`InvoiceID`,`Score`,`DateTimeAdded`)
select`buuid`(),?,`InvoiceID`,`Score`,now()
from(select if(`invoices`.`InvoiceNumber`in(?,?,?),
5,0)`InvoiceNumberScore`,
if(`states`.`APO`=0,match(`shippingoptions`.`Company`,`shippingoptions`.`FullName`,`shippingoptions`.`AddressLine1`,
`shippingoptions`.`AddressLine2`)against(?),
match(`invoices`.`_APOCustomerAddress`)against(?))/
20`MatchScore`,
if(`invoices`.`_ShippingAccountNumber`=?,0.3,0)`ShippingAccountNumberScore`,-abs(datediff(date(`invoices`.`DateTimeShipped`),
ifnull(?,date(`invoices`.`DateTimeShipped`)-interval 14 day)))/7`DateScore`,
if(`invoices`.`_InvoiceTrackingNumberCount`=0,2,0)`InvoiceTrackingNumberCountScore`,`invoices`.`InvoiceID`,
`invoices`.`InvoiceNumber`,`invoices`.`DateTimeShipped`,`shippingoptions`.`FullName`,
`shippingoptions`.`Company`,`shippingoptions`.`AddressLine1`,`shippingoptions`.`AddressLine2`,
`shippingoptions`.`City`,`states`.`StateCode`,`countries`.`CountryCode3`,`shippingoptions`.`Zip`,
`invoices`.`_Total`,`invoices`.`_ShippingAccountNumber`,`companies`.`Name` `CompanyName`,`factories`.`Name` `Factory`,
`networks`.`Icon`,(select`InvoiceNumberScore`+`InvoiceTrackingNumberCountScore`+
`ShippingAccountNumberScore`+`MatchScore`+`DateScore`)`Score`
from`invoices`
left join`invoiceshippingoptions`using(`InvoiceID`)
left join`shippingoptions`
on`shippingoptions`.`ShippingOptionID`=`invoiceshippingoptions`.`ShippingOptionID`
left join`countries`on`countries`.`CountryID`=`shippingoptions`.`CountryID`
left join`states`on`states`.`StateID`=`shippingoptions`.`StateID`
join`companies`on`companies`.`CompanyID`=`invoices`.`CompanyID`
join`networks`on`networks`.`NetworkID`=`companies`.`NetworkID`
join`factories`on`factories`.`FactoryID`=`invoices`.`FactoryID`
where`invoices`.`InvoiceStatusID`<>'c9be156b-ffca-11e4-888d-3417ebdfde80'
having`Score`>0
order by`Score`desc
limit 5)`a`;
With the parameters being (all passed as strings):
'0a1c6452-4ec2-11e6-b570-12c139c58877'
'123456'
'789456123456'
''
'SOME COMPANY - SOME DUDE 117 W MASTER CHIEF LN, ORLANDO, FL 32816 USA'
'SOME COMPANY - SOME DUDE 117 W MASTER CHIEF LN, ORLANDO, FL 32816 USA'
'456789123'
'2016-04-27'
Related
I'm running a PHP script that searches through a relatively large MySQL instance with a table with millions of rows to find terms like "diabetes mellitus" in a column description that has a full text index on it. However, after one day I'm only through a couple hundred queries so it seems like my approach is never going to work. The entries in the description column are on average 1000 characters long.
I'm trying to figure out my next move and I have a few questions:
My MySQL table has unnecessary columns in it that aren't being queried. Will remove those affect performance?
I assume running this locally rather than on RDS will dramatically increase performance? I have a decent macbook, but I chose RDS since cost isn't an issue, and I tried to run on an instance that was better than the my Macbook.
Would using a compiled language like Go rather than PHP do more than the 5-10x boost people report in test examples? That is, given my task is there any reason to think a static language would produce 100X or more speed improvements?
Should I put the data in a text or CSV file rather than MySQL? Is using MySQL just causing unnecessary overhead?
This is the query:
SELECT id
FROM text_table
WHERE match(description) against("+diabetes +mellitus" IN BOOLEAN MODE);
Here's the line of output of EXPLAIN for the query, showing the optimizer is utilizing the FULLTEXT index:
1 SIMPLE text_table fulltext idx idx 0 NULL 1 Using where
The RDS instance is db.m4.10xlarge which has 160GB of RAM. The InnoDB buffer pool is typically about 75% of RAM on an RDS instance, which make it 120GB.
The text_table status is:
Name: text_table
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 26000630
Avg_row_length: 2118
Data_length: 55079485440
Max_data_length: 0
Index_length: 247808
Data_free: 6291456
Auto_increment: 29328568
Create_time: 2018-01-12 00:49:44
Update_time: NULL
Check_time: NULL
Collation: utf8_general_ci
Checksum: NULL
Create_options:
Comment:
This indicates the table has about 26 million rows, and the size of data and indexes is 51.3GB, but this doesn't include the FT index.
For the size of the FT index, query:
SELECT stat_value * ##innodb_page_size
FROM mysql.innodb_index_stats
WHERE table_name='text_table'
AND index_name = 'FTS_DOC_ID_INDEX'
AND stat_name='size'
The size of the FT index is 480247808.
Following up on comments above about concurrent queries.
If the query is taking 30 seconds to execute, then the programming language you use for the client app won't make any difference.
I'm a bit skeptical that the query is really taking 1 to 30 seconds to execute. I've tested MySQL fulltext search, and I found a search runs in under 1 second even on my laptop. See my presentation https://www.slideshare.net/billkarwin/practical-full-text-search-with-my-sql
It's possible that it's not the query that's taking so long, but it's the code you have written that submits the queries. What else is your code doing?
How are you measuring the query performance? Are you using MySQL's query profiler? See https://dev.mysql.com/doc/refman/5.7/en/show-profile.html This will help isolate how long it takes MySQL to execute the query, so you can compare to how long it takes for the rest of your PHP code to run.
Using PHP is going to be single-threaded, so you are running one query at a time, serially. The RDS instance you are using has 40 CPU cores, so you should be able to many concurrent queries at a time. But each query would need to be run by its own client.
So one idea would be to split your input search terms into at least 40 subsets, and run your PHP search code against each respective subset. MySQL should be able to run the concurrent queries fine. Perhaps there will be a slight overhead, but this will be more than compensated for by the parallel execution.
You can split your search terms manually into separate files, and then run your PHP script with each respective file as the input. That would be a straightforward way of solving this.
But to get really professional, learn to use a tool like GNU parallel to run the 40 concurrent processes and split your input over these processes automatically.
I'm running Ubuntu 13.04 with an nginx webserver installed. I'm writing a mini-social network for the users on my website, but for some reason the scripts I use to load things like profiles and "walls" are sometimes slow. Not all of them are slow, but especially the newsfeed script where it shows recent posts by friends.
I've added a bunch of microtime() checks throughout the script and it seems the query to get the recent posts is taking the most time. I tried to optimize it as much as possible but it still seems to be slow. I'm using MySQLi. Here is my query:
SELECT `id`,`posterName`, `posterUUID`, `message`, `postDate`, `likes`, `whoLiked`
FROM `wallposts`
WHERE (
`wallUUID` IN (' . implode(',', $friendStr) . ')
AND posterUUID = wallUUID
)
OR wallUUID="GLOBAL"
AND isDeleted=0
ORDER BY `postDate` DESC
LIMIT 25
Would it be faster to just use SELECT * since I'm pretty much selecting most of the columns anyway? I'm not sure what else to try, so that's why I came here.
Any help please as to what I could do/not do to keep it from taking 5+ seconds just for this query?
Several things:
using * instead of a list of columns is usually a bad idea, the risk is to add a column that you do not need and this column could be containing large amounts of binary data, this would make your query slower. So it's certainly not something to care about when you have speed problems.
you may have some priority of logical operators AND/OR problems
Your query is:
WHERE (A)
OR B
AND C
And I'm pretty sure you mean:
WHERE (
(A)
OR B
)
AND C
But AND takes precedence, so what you have is:
WHERE (A)
OR (
B
AND C
)
When in doubt use parenthesis (I'm in doubt there, but I would use parenthesis).
Your first WHERE condition is quite strange:
WHERE (
wallUUID IN (42,43,44,45,46)
AND posterUUID = wallUUID
)
That mean a filter on the friends identifiers for the wall posts, I guess, and then a filter which says for each row we need to have the same id for the poster uid and for the wall id.
I'm pretty that's not what you wanted. Maybe you need a join query here. Or maybe not, without the structure of your tables it's hard to guess
You will need a pretty decent indexation to get an optimized result on friend's posts results, an dindex which starts by the current user id, contain sthe right sort by date, the deletion thing, and certainly the friends identifiers.
user-friends relationships are hard to manage, especially when volumes gets bigger, usually building a social website involves pub/sub systems (publication subscriptions channels systems). You should study some pubsub databases schemas.
SELECT t1.id,t1.tx_id,t1.tx_date,t1.bx_date,t1.method,t1.theater_id,t1.showtime_id,t1.category_id,t1.amount,t1.fname,t1.status,t1.mobile,(CASE WHEN (t4.type = '1') THEN ( (t1.full_tickets * 2 ) + (t1.half_tickets)) ELSE ( t1.full_tickets + t1.half_tickets ) END) as no_seats ,u.username FROM 'reservation` as t1 LEFT JOIN `theatercategories` as t4 ON t1.category_id=t4.id JOIN `users` AS u ON u.id = t1.user_id WHERE t1.bx_date >= '2012-08-01' AND t1.bx_date <= '2012-08-31' ORDER BY t1.id desc
Above query returns "The connection was reset" error.
It loads 75,000 records(75,195 total, Query took 15.2673 sec). I use MYSQL with Joomla. What seemes be the issue ?
Please guide me. Thanks
There are a number of possible solutions ... depends on the "why" ... so it ends up being a bit of trial and error. On a fresh install, that's tricky to determine. But, if you made a recent "major" change that's a place to start looking - like modifying virtual hosts or adding/enabling XDebug.
Here's a list of things I've used/done/tried in the past
check for infinite loops ... in particular looping through a SQL
fetch result which works 99% of the time except the 1% it doesn't. In
one case, I was using the results of two previous queries as the
upper and lower bounds of a for loop ... and occasionally got a upper
bound of a UINT max ... har har har (vomit)
copying the ./php/libmysql.dll to the windows/system32 directory (Particularly if you see Parent: child process exited with status
3221225477 -- Restarting in your log files ... check out:
http://www.java-samples.com/showtutorial.php?tutorialid=1050)
if you modify PHP's error_reporting at runtime ... in certain circumstances this can cause PHP to degenerate into an unstable state
if, say, in your PHP code you modify the superglobals or fiddle
around with other deep and personal background system variables (Nah,
who would ever do such evil hackery? ahem)
if you convert your MySQL to something other than MyISAM or mysqli
There is a known bug with MySQL related to MyISAM, the UTF8 character set and indexes (http://bugs.mysql.com/bug.php?id=4541)
Solution is to use InnoDB dialect (eg sql set GLOBAL
storage_engine='InnoDb';)
Doing that changes how new tables are created ... which might slightly alter the way results are returned to a fetch statement ...
leading to an infinite loop, a malformed dataset, etc. (although this
change should not hang the database itself)
Other helpful items are to ramp up the debug reporting for PHP and
apache in their config files and restart the servers. The log files
sometimes give a clue as to at least where the problem might reside.
If it happens after your page content was finished it's more likely
in the php settings. If it's during page construction, check your PHP
code. Etc. etc.
Hope the above laundry list helps ...
just refresh you DB link, it might disconnected because of some reason,
MyPHP Application sends a SELECT statement to MySQL with HTTPClient.
It takes about 20 seconds or more.
I thought MySQL can’t get result immediately because MySQL Administrator shows stat for sending data or copying to tmp table while I'd been waiting for result.
But when I send same SELECT statement from another application like phpMyAdmin or jmater it takes 2 seconds or less.10 times faster!!
Dose anyone know why MySQL perform so difference?
Like #symcbean already said, php's mysql driver caches query results. This is also why you can do another mysql_query() while in a while($row=mysql_fetch_array()) loop.
The reason MySql Administrator or phpMyAdmin shows result so fast is they append a LIMIT 10 to your query behind your back.
If you want to get your query results fast, i can offer some tips. They involve selecting only what you need and when you need:
Select only the columns you need, don't throw select * everywhere. This might bite you later when you want another column but forget to add it to select statement, so do this when needed (like tables with 100 columns or a million rows).
Don't throw a 20 by 1000 table in front of your user. She cant find what she's looking for in a giant table anyway. Offer sorting and filtering. As a bonus, find out what she generally looks for and offer a way to show that records with a single click.
With very big tables, select only primary keys of the records you need. Than retrieve additional details in the while() loop. This might look like illogical 'cause you make more queries but when you deal with queries involving around ~10 tables, hundreds of concurrent users, locks and query caches; things don't always make sense at first :)
These are some tips i learned from my boss and my own experince. As always, YMMV.
Dose anyone know why MySQL perform so difference?
Because MySQL caches query results, and the operating system caches disk I/O (see this link for a description of the process in Linux)
I see, from time to time, that people say that SQL query that is sent to a server from client application should not contain any extra linebreaks or spaces. One of the reason I've heard is "why waste network traffic?".
Is there a real reason to make code harder to read and edit in favor of removing all spaces?
With spaces:
$q = 'SELECT
`po`.*,
`u`.`nickname`,
`u`.`login`
FROM
`postponed_operations` AS `po`
LEFT JOIN `users` AS `u` ON `u`.`id` = `po`.`user_id`
ORDER BY `will_be_deleted_after`';
return mysql_query($q);
Without spaces:
$q = 'SELECT '.
'`po`.*,'.
'`u`.`nickname`,'.
'`u`.`login`'.
'FROM '.
'`postponed_operations` AS `po` '.
'LEFT JOIN `users` AS `u` ON `u`.`id`=`po`.`user_id` '.
'ORDER BY `will_be_deleted_after`';
return mysql_query($q);
It is true, it will cost network traffic and server time; but it will be negligible on all except the most extreme cases.
Now, if you are editing the code of FaceBook (or Google, or similar), and optimize in this way the 10 most common queries, then there is a point, since they will be run billions of times per day.
But in all the other cases I think it is a waste of time to consider removing spaces.
This is subjective, but readability beats the few extra spaces and line breaks anytime in my opinion. And if coding standards would dictate to break of out the string every time, I'd probably go insane.
If you absolutely must optimize spaces and such away, do not do it in your source code. Instead put it through an automated intermediate tool.
If we were talking about web, I'd say that the extra cost in doing might potentially be worth it for static content (script files that rarely change and such) but I would be skeptical about doing it for dynamic content.
In all cases:
If you change the source, it will be a maintenance nightmare.
If you put it through a compression/decompression tool, you'll save significantly more (on average) than simply removing spaces but at a cost of latency and CPU time.
Unless you have some really pathological structure, it basically constitutes a tiny fraction compared to the total cost, even if we only considered the size of TCP packets, and query data returned.
Perhaps not relevant in your case, but I'll mention it anyway: a completely different approach might be to use a tightly packed message format instead with a query ID, instead of transferring the query every time.
Absolutely. I do that all the times. I also:
remove backquotes. Who needs them?
Therefore,
`po` ----- becomes -----> po
use as small as possible names for databases, tables, fields, indices, etc.
Therefore,
postponed_operations --- becomes ---> po --- p is already taken for posts
will_be_deleted_after --- becomes ---> wi --- w is already taken for words
Completely drop unnecessary keywords like AS. All table names are short anyway (rule 2 !)
Therefore,
LEFT JOIN `users` AS `u` --- becomes ---> LEFT JOIN u
As a result, I would have written the above query as:
$q='SELECT po.*,ni,lo FROM po LEFT JOIN u ON i=ui ORDER BY wi'
Tags:
joke
Although it is true that removing the unnecessary spaces and line breaks would reduce the amount of data that you send to the database server, but you should not bother about that.
Rather you should bother about the Readability and maintainability of the code. These are the two very important things you need to keep in mind while writing software code!
If reducing network traffic was the only good thing, then we could argue that you should make a Stored Procedure for every query that you write.
For eg. You could change the following query
SELECT
`po`.*,
`u`.`nickname`,
`u`.`login`
FROM
`postponed_operations` AS `po`
LEFT JOIN `users` AS `u` ON `u`.`id` = `po`.`user_id`
ORDER BY `will_be_deleted_after`;
to
CALL GetLoginData();
Now that would be ~80-95% reduction. But is it worth it?
Definitely No.
Doing things like these would rather make the developers' life miserable without adding any significant value!
That being said, use minified version of code only at places where nobody would be changing the code. For eg. CSS libraries and JS libraries that you wont changing ever!
I hope you got the point, and you will continue to write Readable and Maintainable code!