On my html page, the user has the option to either enter a text string, check mark options, or do both. This data is then placed inside a mysql query which displays the data.
The fact that the user is allowed to enter a string means that I am using the LIKE function in the mysql query.
Correct me if I am wrong, but I believe the LIKE function can slow the query down a lot.
In relation to the above statement, I would like to know whether an empty string in the LIKE function would make a difference, so for example:
select * from hello;
select * from hello where name like "%%";
If it does make a significant difference (I believe this database will be growing larger) what are your ideas on how to deal with this.
My first idea was that I will have 2 queries:
One with the like functionality
and one without the like functionality. Depending on what the user enters, the correct query will be called.
So for example if the user leaves the search box empty, the like function will not be needed, there fore it will send a null character, and an if statement will select the other option (without the like functionality) when it sees there is a null character.
Is there a better way of doing this?
In general, the LIKE function will be slow unless it begins with a fixed string and the column has an index. If you do LIKE 'foo%', it can use the index to find all rows that begin with foo, because MySQL indexes use B-trees. But LIKE '%foo' cannot make use of an index, because B-trees only optimize looking for prefixes; this has to do a sequential scan of the entire table.
And even when you use the version with a prefix, the performance improvement depends on how much that prefix reduces the number of rows that have to be searched. If you do LIKE 'foo%bar', and 90% of your rows begin with foo, this will still have to scan 90% of the table to test whether they end with bar.
Since LIKE '%%' doesn't have a fixed prefix, it will perform a full scan of the table, even though there isn't actually anything to search for. It would be best if your PHP script tested whether the user provided a search string, and omit the LIKE test if there's nothing to search for.
I believe the LIKE function can slow the query down a lot
I would expect that not to be the case. How hard would it be to test it?
Regardless which version of the query you run, the DBMS still has to examine every row in the table. That will require some extra work by the CPU, but for large tables, disk I/O will be the limiting factor. LIKE '%%' will discard rows with null values - hence potentially reducing the amount of data the DBMS needs to retain in the result set / transfer to the client which may be significant saving.
As Barbar says, providing an expression without a leading wildcard will allow the DBMS to use an index (if one is available) which will have a big impact on performance.
Its hard to tell from your question (you didn't provide much in the way of example queries/data nor any detail of what the application does) but the solution to your problem might be full text indexing
Using the World database sample from the mysql software distribution, I first did a simple explain on queries with and without where clauses without filtering effects:
mysql> explain select * from City;
mysql> explain select * from City where true;
mysql> explain select * from City where Name = Name;
In these first three cases, the result is as follow:
+----+--------------+-------+------+----------------+-----+---------+-----+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+-------+------+----------------+-----+---------+-----+------+-------+
| 1 | SIMPLE | City | ALL | NULL | NULL | NULL | NULL | 4080 | |
+----+--------------+-------+------+----------------+-----+---------+-----+------+-------+
While for the last query, I got the following:
mysql> explain select * from City where Name like "%%";
+----+--------------+-------+------+----------------+-----+---------+-----+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+-------+------+----------------+-----+---------+-----+------+-------+
| 1 | SIMPLE | City | ALL | NULL | NULL | NULL | NULL | 4080 | Using where |
+----+--------------+-------+------+----------------+-----+---------+-----+------+-------+
You can see that for this particular query, the where condition was not optimized away.
I also performed a couple of measurements, to check if indeed there would be a sensible difference, but:
the table having only 4080 rows, I used a self cross join to render longer computation times
I used having clauses to cut down on display overhead (1).
Measurement results:
mysql> select c1.Name, c2.Name from City c1, City c2 where concat(c1.Name,c2.Name) = concat(c1.Name,c2.Name) having c1.Name = "";
Empty set (5.22 sec)
The above query, as well as one with true or c1.Name = c1.Name performed sensibly the same, within less than a 0.1 sec margin.
mysql> reset query cache;
mysql> select c1.Name, c2.Name from City c1, City c2 where concat(c1.Name,c2.Name) like "%%" having c1.Name = "";
Empty set (13.80 sec)
This one also took around the same amount of time when run several times (in between query cache resets) (2).
Clearly the query optimizer doesn't see an opportunity for the later case. The conclusion is that you should try to avoid as much as possible the use of that clause, even if it doesn't change the result set.
(1): having clause filtering happening after data consolidation from the query, I assumed it shouldn't change the actual query computation load ratio.
(2): interestingly, I initially tried a simple where c1.Name like ”%%", and got around 5.0 sec. timing results, which led me to try out with a more elaborate clause. I don't think that result changes the overall conclusion; it could be that in that very specific case, the filtering actually has a beneficial effect. Hopefully a mysql guru will explain that result.
Related
I am making a PHP backend API which executes a query on MySQL database. This is the query:
SELECT * FROM $TABLE_GAMES WHERE
($GAME_RECEIVERID = '$userId'OR $GAME_OTHERID = '$userId')
ORDER BY $GAME_ID LIMIT 1"
Essentially, I'm passing $userId as parameter, and getting row with smallest $GAME_ID value and it would return result in less than 100 ms for users that have around 30 000 matching rows in table. However, I have since added new users, that have around <100 matching rows, and query is painfully slow for them, taking around 20-30 seconds every time.
I'm puzzled to why the query is so much slower in situations where it is supposed to return low amount of rows, and extremely fast when returns huge amount of rows especially since I have ORDER BY.
I have read about parameter sniffing, but as far as I know, that's the SQL Server thing, and I'm using MySQL.
EDIT
Here is the SHOW CREATE statement:
CREATE TABLEgames(
IDint(11) NOT NULL AUTO_INCREMENT,
SenderIDint(11) NOT NULL,
ReceiverIDint(11) NOT NULL,
OtherIDint(11) NOT NULL,
Timestamptimestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (ID)
) ENGINE=MyISAM AUTO_INCREMENT=17275279 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Here is the output of EXPLAIN
+----+-------------+-------+------+---------------+------+---------+-----+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |
+----+-------------+-------+------+---------------+------+---------+-----+------+-------+
| 1 | SIMPLE | games | NULL | index | NULL | PRIMARY | 4 | NULL | 1 |
+----+-------------+-------+------+---------------+------+---------+-----+------+-------+
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE games NULL index NULL PRIMARY 4 NULL 1 19.00 Using where
I tried prepared statement, but still getting the same result.
Sorry for poor formatting, I'm still noob at this.
You need to use EXPLAIN to analyse the performance of the query.
i.e.
EXPLAIN SELECT * FROM $TABLE_GAMES WHERE
($GAME_RECEIVERID = '$userId'OR $GAME_OTHERID = '$userId')
ORDER BY $GAME_ID LIMIT 1"
The EXPLAIN would provide the information about the select query with execution plan.
It is great tool to identify the slowness in the query. Based on the obtained information you can create the Indexes for the columns used in WHERE clause .
CREATE INDEX index_name ON table_name (column_list)
This would definitely increase the performance of the query.
Your query is being slow because it cannot find a matching record fast enough. With users where a lot of rows match, chances of finding a record to return are much higher, all other things being equal.
That behavior appears when $GAME_RECEIVERID and $GAME_OTHERID aren't part of an index, prompting MySQL to use the index on $GAME_ID because of the ordering. However, since newer players have not played the early games, there are literally millions of rows that won't match, but have to be checked nonetheless.
Unfortunately, this is bound to get worse even for old users, as your database grows. Ideally, you will add indexes on $GAME_RECEIVERID and $GAME_OTHERID - something like:
ALTER TABLE games
ADD INDEX receiver (ReceiverID),
ADD INDEX other (OtherID)
PS: Altering a 17 million rows table is going to take a while, so make sure to do it during a maintenance window or similar if this is used in production.
Is this the query after the interpolation? That is, is this what MySQL will see?
SELECT * FROM GAMES
WHERE RECEIVERID = '123'
OR OTHERID = '123'
ORDER BY ID LIMIT 1
Then this will run fast, regardless:
SELECT *
FROM GAMES
WHERE ID = LEAST(
( SELECT MIN(ID) FROM GAMES WHERE RECEIVERID = '123' ),
( SELECT MIN(ID) FROM GAMES WHERE OTHERID = '123' )
);
But, you will need both of these:
INDEX(RECEIVERID, ID),
INDEX(OTHERID, ID)
Your version of the query is scanning the table until it finds a matching row. My version will
make two indexed lookups;
fetch the other columns for the one row.
It will be the same, fast, speed regardless of how many rows there are for USERID.
(Recommend switching to InnoDB.)
I have a small problem with a php mysql query, I am looking for help.
I have a family tree table, where I am storing for each person his/her ancestors id separated by a comma. like so
id ancestors
10 1,3,4,5
So the person of id 10 is fathered by id 5 who is fathered by id 4 who is fathered by 3 etc...
Now I wish to select all the people who have id x in their ancestors, so the query will be something like:
select * from people where ancestors like '%x%'
Now this would work fine except, if id x is lets say 2, and a record has an ancestor id 32, this like query will retrieve 32 because 32 contains 2. And if I use '%,x,%' (include commas) the query will ignore the records whose ancestor x is on either edge(left or right) of the column. It will also ignore the records whose x is the only ancestor since no commas are present.
So in short, I need a like query that looks up an expression that either is surrounded by commas or not surrounded by anything. Or a query that gets the regular expression provided that no numbers are around. And I need it as efficient as possible (I suck at writing regular expressions)
Thank you.
Edit: Okay guys, help me come up with a better schema.
You are not storing your data in a proper way. Anyway, if you still want to use this schema you should use FIND_IN_SET instead of LIKE to avoid undesired results.
SELECT *
FROM mytable
WHERE FIND_IN_SET(2, ancestors) <> 0
You should consider redesigning your database structure. Add new table "ancestors" to database with columns:
id id_person ancestor
1 10 1
2 10 3
3 10 4
After -- use JOIN query with "WHERE IN" to choose right rows.
You're having this issue because of wrong design of database.First DBMS based db's aren't meant for this kind of data,graph based db's are more likely to fit for this kind of solution.
if it contain small amount of data you could use mysql but still the design is still wrong,if you only care about their 'father' then just add a column to person (or what ever you call it) table. if its null - has no father/unknown otherwise - contains (int) of his parent.
In case you need more then just 'father' relationship you could use a pivot table to contain two persons relationship but thats not a simple task to do.
There are a few established ways of storing hierarchical data in RDBMS. I've found this slideshow to be very helpful in the past:
Models for Hierarchical Design
Since the data deals with ancestry - and therefore you wouldn't expect it to change that often - a closure table could fit the bill.
Whatever model you choose, be sure to look around and see if someone else has already implemented it.
You could store your values as a JSON Array
id | ancestors
10 | {"1","3","4","5"}
and then query as follows:
$query = 'select * from people where ancestors like \'%"x"%\'';
Better is of course using a mapping table for your many-to-many relation
You can do this with regexp:
SELECT * FROM mytable WHERE name REGEXP ',?(x),?'
where x is your searched value
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,ancestors VARCHAR(250) NOT NULL
);
INSERT INTO my_table VALUES(10,',1,3,4,5');
SELECT *
FROM my_table
WHERE CONCAT(ancestors,',') LIKE '%,5,%';
+----+-----------+
| id | ancestors |
+----+-----------+
| 10 | ,1,3,4,5 |
+----+-----------+
SELECT *
FROM my_table
WHERE CONCAT(ancestors,',') LIKE '%,4,%';
+----+-----------+
| id | ancestors |
+----+-----------+
| 10 | ,1,3,4,5 |
+----+-----------+
Trying to write statement where in single statement select all (*) and sum one column from the same database and the same table, depending on conditions.
Wrote such statement (based on this Multiple select statements in Single query)
SELECT ( SELECT SUM(Amount) FROM 2_1_journal), ( SELECT * FROM 2_1_journal WHERE TransactionPartnerName = ? )
I understand that SELECT SUM(Amount) FROM 2_1_journal will sum all values in column Amount (not based on codition).
But at first want to understand what is correct statement
With above statement get error SQLSTATE[21000]: Cardinality violation: 1241 Operand should contain 1 column(s)
Can not understand error message. From advice here MySQL - Operand should contain 1 column(s) understand that subquery SELECT * FROM 2_1_journal WHERE TransactionPartnerName = ? must select only one column?
Tried to change statement to this SELECT ( SELECT * FROM 2_1_journal WHERE TransactionPartnerName = ? ), ( SELECT SUM(Amount) FROM 2_1_journal), but get the same error...
What would be correct statement?
SELECT *, (SELECT SUM(Amount) FROM 2_1_journal)
FROM 2_1_journal
WHERE TransactionPartnerName = ?
This selects sums up Amount from the entire table and "appends" all rows where TransactionPartnerName is the parameter you bind in the client code.
If you want to limit the sum to the same criteria as the rows you select, just include it:
SELECT *, (SELECT SUM(Amount) FROM 2_1_journal WHERE TransactionPartnerName = ?)
FROM 2_1_journal
WHERE TransactionPartnerName = ?
A whole different thing: table names like 2_1_journal are strong indicators of a broken database design. If you can redo it, you should look into how to normalize the database properly. It is most likely pay back many times over.
With regard to normalization (added later):
Since the current design uses keys in table names (such as the 2 and 1 in 2_1_journal), I'll quickly illustrate how I think you can vastly improve that design. Lets say that the table 2_1_journal has the following data (I'm just guessing here because the tables haven't been described anywhere yet):
title | posted | content
------+------------+-----------------
Yes! | 2013-01-01 | This is just a test
2nd | 2013-01-02 | Another test
This stuff belongs to user 2 in company 1. But hey! If you look at the rows, the fact that this data belongs to user 2 in company 1 is nowhere to be found.
The problem is that this design violates one of the most basic principles of database design: don't use keys in object (here: table) names. A clear indication that something is very wrong is if you have to create new tables if something new is added. In this case, adding a new user or a new company requires adding new tables.
This issue is easilly fixed. Create one table named journal. Next, use the same columns, but add another two:
company | user | title | posted | content
--------+------+-------+------------+-----------------
1 | 2 | Yes! | 2013-01-01 | This is just a test
1 | 2 | 2nd | 2013-01-02 | Another test
Doing it like this means:
You never add or modify tables unless the application changes.
Doing joins across companies or users (and anything else that used to be part of the table naming scheme is now possible with a single, fairly simple select statement).
Enforcing integrity is easy - if you upgrade the application and want to change the tables, the changes doesn't have to be repeated for each company and user. More importantly, this lowers the risk of having the application get out of sync with the tables in the database (such as adding the field comments to all x_y_journal tables, but forgetting 5313_4324_journal causing the application to break only when user 5313 logs in. This is the kind of problem you don't want to deal with.
I am not writing this because it is a matter of personal taste. Databases are just designed to handle tables that are laid out as I describe above. The design where you use object keys as part of table names has a host of other problems associated with it that are very hard to deal with.
In situations like this which method or mix of methods performs the quickest?
$year = db_get_fields("select distinct year from car_cache order by year desc");
Or
$year = db_get_fields("select year from car_cache");
$year = array_unique($year);
sort($year);
I've heard the distinct on mysql is a real big performance hit for large queries and this table can have a million rows or more. I wondered what combination of database types, Innodb or MyISAM, would work best too. I know many optimizations are very query dependent. Year is an unsigned number, but other fields are varchar of different lengths I know that may make a difference too. Such as:
$line = db_get_fields("select distinct line from car_cache where year='$postyear' and make='$postmake' order by line desc");
I read that using the new innodb multiple keys method can make queries like this one very very quick. But the distinct and order by clauses are red flags to me.
Have MySQL do as much work as possible. If it isn't being efficient at what its doing, then things likely aren't set up correctly (whether it is proper indexing for the query you are trying to run, or settings with sort buffers).
If you have an index on the year column, then using DISTINCT should be efficient. If you do not, then a full table scan is necessary in order to fetch the distinct rows. If you try to sort out the distinct rows in PHP rather than MySQL, then you transmit (potentially) much more data from MySQL to PHP, and PHP consumes much more memory to store all that data before eliminating the duplicates.
Here is some sample output from a dev database I have. Also note that this database is on a different server on the network from where the queries are being executed.
SELECT COUNT(SerialNumber) FROM `readings`;
> 97698592
SELECT SQL_NO_CACHE DISTINCT `SerialNumber`
FROM `readings`
ORDER BY `SerialNumber` DESC
LIMIT 10000;
> Fetched 10000 records. Duration: 0.801 sec, fetched in: 0.082 sec
> EXPLAIN *above_query*
+----+-------------+----------+-------+---------------+---------+---------+------+------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-----------------------------------------------------------+
| 1 | SIMPLE | readings | range | NULL | PRIMARY | 18 | NULL | 19 | Using index for group-by; Using temporary; Using filesort |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-----------------------------------------------------------+
If I attempt the same query, except replace the SerialNumber column with one that is non-indexed, then it takes forever to run because MySQL has to examine all 97 million rows.
Some of the efficiency has to do with how much data you expect to get back. If I slightly modify the above queries to operate on the time column (the timestamp of the reading), then it takes 1 min 40 seconds to get a distinct list of 273,505 times, most of the overhead there is in transferring all the records over the network. So keep in mind the limits on how much data you are getting back, you want to keep that as low as possible for the data you are trying to fetch.
As for your final query:
select distinct line from car_cache
where year='$postyear' and make='$postmake'
order by line desc
There should be no problem with that either, just make sure you have a compound index on year and make and possibly an index on line.
On a final note, the engine I am using for the readings table is InnoDB, and my server is: 5.5.23-55-log Percona Server (GPL), Release 25.3 which is a version of MySQL by Percona Inc.
Hope that helps.
I see an ever increasing number of users signing up on my site to just send duplicate SPAM messages to other users. I've added some server side code to detect duplicate messages with the following mysql query:
SELECT count(content) as msgs_sent
FROM messages
WHERE sender_id = '.$sender_id.'
GROUP BY content having count(content) > 10
The query works well but now they're getting around this by changing a few charctersr in their messages. Is there a way to detect this with MySQL or do I need to look at each grouping returned from MySQL and then use PHP to determine the percentage of similarity?
Any thoughts or suggestions?
Fulltext Match
You could look at implementing something similar to the MATCH example here:
mysql> SELECT id, body, MATCH (title,body) AGAINST
-> ('Security implications of running MySQL as root') AS score
-> FROM articles WHERE MATCH (title,body) AGAINST
-> ('Security implications of running MySQL as root');
+----+-------------------------------------+-----------------+
| id | body | score |
+----+-------------------------------------+-----------------+
| 4 | 1. Never run mysqld as root. 2. ... | 1.5219271183014 |
| 6 | When configured properly, MySQL ... | 1.3114095926285 |
+----+-------------------------------------+-----------------+
2 rows in set (0.00 sec)
So for your example, perhaps:
SELECT id, MATCH (content) AGAINST ('your string') AS score
FROM messages
WHERE MATCH (content) AGAINST ('your string')
AND score > 1;
Note that to use these functions your content column would need to be a FULLTEXT index.
What is score in this example?
It is a relevance value. It is computed through the process described below:
Every correct word in the collection and in the query is weighted
according to its significance in the collection or query.
Consequently, a word that is present in many documents has a lower
weight (and may even have a zero weight), because it has lower
semantic value in this particular collection. Conversely, if the word
is rare, it receives a higher weight. The weights of the words are
combined to compute the relevance of the row.
From the documentation page.