MySQL Occasionally Returns Wrong Value - php

This is a general question, one that I've been scratching my head on for a while now. My company's database handles about 2k rows a day. 99.9% of the time, we have no problem with the values that are returned in the different SELECT statements that are set up. However, on a very rare occasion, our database will "glitch" and return the value for a completely different row than what was requested.
This is a very basic example:
+---------+-------------------------+
| row_id | columnvalue |
+---------+-------------------------+
| 1 | 10 |
| 2 | 20 |
| 3 | 30 |
| 4 | 40 |
+---------+-------------------------+
SELECT columnvalue FROM table_name WHERE row_id = 1 LIMIT 1
Returns: 10
But on the very rare occasion, it may return: 20, or 30, etc.
I am completely baffled as to why it does this sometimes and would appreciate some insight on what appears to be a programming phenomena.
More specific information:
SELECT
USERID, CONCAT( LAST, ', ', FIRST ) AS NAME, COMPANYID
FROM users, companies
WHERE users.COMPANYCODE = companies.COMPANYCODE
AND USERID = 9739 LIMIT 1
mysql> DESCRIBE users;
+------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+----------------+
| USERID | int(10) | NO | PRI | NULL | auto_increment |
| COMPANYCODE| varchar(255)| NO | MUL | | |
| FIRST | varchar(255)| NO | MUL | | |
| LAST | varchar(255)| NO | MUL | | |
+------------+-------------+------+-----+---------+----------------+
mysql> DESCRIBE companies;
+------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+----------------+
| COMPANYID | int(10) | NO | PRI | NULL | auto_increment |
| COMPANYCODE| varchar(255)| NO | MUL | | |
| COMPANYNAME| varchar(255)| NO | | | |
+------------+-------------+------+-----+---------+----------------+
What the results were suppose to be: 9739, "L----, E----", 2197
What the results were instead: 9739, "L----, E----", 3288
Basically, it returned the wrong company id based off the join with companycode. Given the nature of our company, I can't share any more information than that.
I have run this query 5k times and have made very modification to the code imaginable in order to generate the second set of results and I have no been able to duplicate it. I'm not quick to blame MySQL -- this has been happening (though rarely) for over 8 years, and have exhausted all other possible causes. I have suspected the results were manually changed after the query was ran, but the timestamps states otherwise.
I'm just scratching my head as to why this can run perfectly 499k out of 500k times.

Now that we have a more realistic query, I notice right away that you are joining the tables, not on the primary key, but on the company code. Are we certain that the company code is being enforced as a unique index on companies? The Limit 1 would hide a second row if such a row was found.
From a design perspective, I would make the join on the primary key to avoid even the possibility of duplicate keys and put company code in as a unique indexed field for display and lookup only.

This behavior is either due to an incredibly unlikely SERIOUS bug in MySQL, -or- MySQL is returning a result that is valid at the time the statement is run, and there is some other software that is garfing up the displayed result.
One possibility to consider is that the row had been modified (by some other statement) at the time your SQL statement executed, and then the row was changed again later. (That's the most likely explanation we'd have for MySQL returning an unexpected result.)
The use of the LIMIT 1 clause is curious, because if the predicate uniquely identifies a row, there should be no need for the LIMIT 1, since the query is guaranteed to return no more than one row.
This leads me to suspect that row_id is not unique, and that the query actually returns more than one row. With the LIMIT clause, there is no guarantee as to which of the rows will get returned (absent an ORDER BY clause.)
Otherwise, the most likely culprit is out dated cache contents, or other problems in the code.
UPDATE
The previous answer was based on the example query given; I purposefully omitted the possibility that EMP was a view that was doing a JOIN, since the question originally said it was a table, and the example query showed just the one table.
Based on the new information in the question, I suggest that you OMIT the LIMIT 1 clause from the query. That will identify that the query is returning more than one row.
From the table definitions, we see that the database isn't enforcing a UNIQUE constraint on the COMPANYCODE column in the COMPANY table.
We also know there isn't a foreign key defined, due to the mismatch between the datatypes.
Normally, the foreign key would be defined referencing the PRIMARY KEY of the target table.
What we'd expect the users table to have a company_id column, which references the id (primary key) column in the companies table.
(We note the datatype of the companycode column (int) matches the datatype of the primary key column in the companies table, and we note that the join condition is matching on the companycode column, even though the datatypes do not match, which is very odd.)

There are several reasons this could happen. I suggest you look at the assumptions you're making. For example:
If you're using GROUP BY and one of the columns isn't an aggregate or the grouping expression, you're going to get an unpredictable value in that column. Make sure you use an appropriate aggregation (such as MAX or MIN) to get a predictable result on each column.
If you're assuming a row order without making it explicit, and using LIMIT to get only the first row, the actual returned order of rows differs depending on that result's execution plan, which is going to differ in large resultsets based on the statistics available to the optimiser. Make sure you use ORDER BY in such situations.

Related

Optimize a JOIN ORDER BY RAND() mysql query in a large database

I am working on a project which has a large Question Bank, and for Tests added to the System, 20 questions are fetched on Run-Time dynamically based on the following query:
SELECT Question.* from Question JOIN Test
ON Question.Subject_ID = Test.Subject_ID
AND Question.Question_Level = Test.Test_Level
ORDER BY RAND()
LIMIT 20;
However, as it is known that the RAND() function the MySQL kills your server I have been looking for better solutions.
Result of EXPLAIN [above query]:
+----+-------------+----------+------+---------------+------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+---------------+------+---------+------+------+----------------------------------------------------+
| 1 | SIMPLE | Test | ALL | NULL | NULL | NULL | NULL | 5 | Using temporary; Using filesort |
| 1 | SIMPLE | Question | ALL | NULL | NULL | NULL | NULL | 7 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+----------+------+---------------+------+---------+------+------+----------------------------------------------------+
Result of EXPLAIN Question:
+-------------------+------------------------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+------------------------------------------+------+-----+---------+----------------+
| Question_ID | int(11) | NO | PRI | NULL | auto_increment |
| Questions | varchar(100) | NO | | NULL | |
| Available_Options | varchar(200) | NO | | NULL | |
| Correct_Answer | varchar(50) | NO | | NULL | |
| Subject_ID | int(11) | NO | | NULL | |
| Question_Level | enum('Beginner','Intermediate','Expert') | NO | | NULL | |
| Created_By | int(11) | NO | | NULL | |
+-------------------+------------------------------------------+------+-----+---------+----------------+
Result of EXPLAIN Test:
+----------------+------------------------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+------------------------------------------+------+-----+---------+----------------+
| Test_ID | int(11) | NO | PRI | NULL | auto_increment |
| Test_Name | varchar(50) | NO | | NULL | |
| Test_Level | enum('Beginner','Intermediate','Expert') | NO | | NULL | |
| Subject_ID | int(11) | NO | | NULL | |
| Question_Count | int(11) | NO | | NULL | |
| Created_By | int(11) | NO | | NULL | |
+----------------+------------------------------------------+------+-----+---------+----------------+
Any help would be appreciated to optimize the query to reduce server load and execution time.
P.S. The system has the capability of Deletion too so the AUTO_INCREMENT PRIMARY KEY of the QUESTION and TEST table can have large gaps.
I like this question. It's a very good optimization puzzle, and let's assume for the moment that performance is very important for this query, and that you cannot use any dynamically inserted values (e.g. from PHP).
One high performance solution would be to add column with random values (say called "Rand"), order the table by this value, and periodically regenerate and re-order the table. You could then use a query like this one:
SELECT Question.* from Question
JOIN Test
ON Question.Subject_ID = Test.Subject_ID
AND Question.Question_Level = Test.Test_Level
WHERE Question.Rand > RAND()
LIMIT 20
This would perform at O(n), requiring only one scan of the table, but it would come with the risk of returning fewer than 20 results if a value very close to 1 was generated. If this was an acceptable risk (e.g. you could programmatically check for an inadequate result and re-query), you would end up with nice runtime performance.
The periodic re-generating and re-ordering of the numbers is necessary because rows early in the table with high Rand values would be favored and show up disproportionately frequently in the results. (Imagine if the first row was lucky enough to receive a Rand value of .95)
Even better would be to create a column with contiguous integers, index on this column, and then randomly choose an insertion point to grab 20 results. Such a query might look like this:
SELECT Question.* from Question
JOIN Test
ON Question.Subject_ID = Test.Subject_ID
AND Question.Question_Level = Test.Test_Level
CROSS JOIN (SELECT MAX(Rand_id) AS max_id FROM Question)
WHERE Question.Rand_Id > ROUND(RAND() * max_id)
LIMIT 20
But what if you can't alter your table in any way? If it doesn't matter how messy your SQL is, and there is a relatively low proportion of missing ids (say roughly 1/10th). You could achieve your 20 random questions with a good degree of probability with the following SQL:
SELECT Question.* from Question JOIN Test
ON Question.Subject_ID = Test.Subject_ID
AND Question.Question_Level = Test.Test_Level
WHERE Question.Question_ID IN (
SELECT DISTINCT(ROUND(rand * max_id)) AS rand_id
FROM ( --generate 30 random numbers to make sure we get 20 results
SELECT RAND() AS rand UNION ALL
SELECT RAND() AS rand UNION ALL
SELECT RAND() AS rand UNION ALL
SELECT RAND() AS rand UNION ALL
...
SELECT RAND() AS rand UNION ALL
SELECT RAND() AS rand UNION ALL
SELECT RAND() AS rand
) a
CROSS JOIN ( --get the max possible id from the Question table
SELECT MAX(id) AS max_id FROM Question
) b
)
LIMIT 20 --finally pare our results down to 20 in case we got too many
However, this will cause problems in your use case, because you effectively can't know how many results (and their IDs) will be in the result set after the join. After joining on subject and difficulty, the proportion of missing IDs might be very high and you might end up with far fewer than 20 results, even with several hundred random guesses of what IDs might be in a table.
If you're able to use logic from PHP (sounds like you are), a lot of high performance solutions open up. You could, for example, create in PHP an object whose job it was to store arrays of all the IDs of Questions with a particular subject and difficulty level. You could then pick 20 random array indexes and get back 20 valid IDs, allowing you to run a very simple query.
SELECT Question.* from Question WHERE Question_ID IN ($dynamically_inserted_ids)
Anyway, I hope this gets your imagination going with some possibilities.
Why don't you rand the numbers in PHP and then select the questions by id?
Here's the logic of my point:
$MIN = 1;
$MAX = 50000; // You may want to get the MAX from your database
$questions = '';
for($i = 0; $i < 20; $i++)
$questions .= mt_rand($MIN, $MAX) . ',';
// Removes last comma
$questions = rtrim($questions, ',');
$query = "SELECT * FROM Question WHERE Question.id IN ($questions)";
Edit 1:
I was thinking about the problem, and it ocurred me that you can select all the ID's from your db and then pick 20 items using the array_rand() function.
$values = array(1, 5, 10000, 102021, 1000000); // Your database ID's
$questions = array_rand($values, 20);
$questions[0];
$questions[1];
$questions[2]; // etc
Create the following indexes:
CREATE INDEX Question_Subject_ID_idx ON Question (Subject_ID);
CREATE INDEX Test_Subject_ID_idx ON Test (Subject_ID);
CREATE INDEX Question_Question_Level_idx ON Question (Question_Level);
CREATE INDEX Test_Test_Level_idx ON Test (Test_Level);
I investigated on the same issue a while ago and my first approach was to load all IDs first, pick random ones in PHP (see: Efficiently pick n random elements from PHP array (without shuffle)) then query for these IDs directly in MySQL.
This was an improvement but memory-consuming for large data sets. On further investigation I found a better way: Pick random IDs in one query without any other fields or JOINs, then do your real query by these IDs:
SELECT Question.* from Question JOIN Test
ON Question.Subject_ID = Test.Subject_ID
AND Question.Question_Level = Test.Test_Level
WHERE Question_ID IN (
SELECT Question_ID from Question
ORDER BY RAND()
LIMIT 20
);
Here's a blog post with benchmarks for my concrete case: Show random products in Magento.
Relevant parts:
Besides the memory issues, could it be that ORDER BY RAND() by itself
is not the problem, but using it together with all the table joins of
Magento? What if I preselect the random IDs with ORDER BY RAND()?
[...]
It was slightly slower than the PHP preselect approach, but still clearly in favor of the pure order by rand and without the increased memory usage in PHP.
[...]
The problem of the pure MySQL approach with ORDER BY RAND() became even more evident. While monitoring MySQL with mytop I noticed that besides for sorting, lots of time is spent for copying. The problem here seems to be, that sorting without an index, as with ORDER BY RAND() copies the data to a temporary table and orders that. With the flat index, all product attributes are fetched from a single table, which increases the amount of data copied to and from the temporary table for sorting. I might be missing something else here, but the performance dropped from bad to horrible, and it even caused my Vagrantbox to crash at first try because its disk got full (40 GB). So while PHP uses less memory with this approach, MySQL is all the more resource hungry.
I don't know how big your questions table is, at some point this approach is still flawed:
Second, as stated above, for big catalogs you should look for something different. The problem with ORDER BY RAND() is that even though we minimized the data to be copied, it still copies all rows to a temporary table and generates a random number for each. The sorting itself is optimized to not sort all rows (See LIMIT Optimization), but copying takes its time.
There is another famous blog post on selecting random rows in MySQL written by Jan Kneschke. He suggests using an index table with all ids, that has its own primary key without gaps. This index table would be updated automatically with triggers, and random rows can be selected by the index table, using random keys between min(key) and max(key).
If you don't use any additional conditions and query random entries from all questions this should work for you.

Change existing row ID based on AUTO_INCREMENT (unique key)

I have a table that records tickets that are separated by a column that denotes the "database". I have a unique key on the database and cid columns so that it increments each database uniquely (cid has the AUTO_INCREMENT attribute to accomplish this). I increment id manually since I cannot make two AUTO_INCREMENT columns (and I'd rather the AUTO_INCREMENT take care of the more complicated task of the uniqueness).
This makes my data look like this basically:
-----------------------------
| id | cid | database |
-----------------------------
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 2 |
-----------------------------
This works perfectly well.
I am trying to make a feature that will allow a ticket to be "moved" to another database; frequently a user may enter the ticket in the wrong database. Instead of having to close the ticket and completely create a new one (copy/pasting all the data over), I'd like to make it easier for the user of course.
I want to be able to change the database and cid fields uniquely without having to tamper with the id field. I want to do an UPDATE (or the like) since there are foreign key constraints on other tables the link to the id field; this is why I don't simply do a REPLACE or DELETE then INSERT, as I don't want it to delete all of the other table data and then have to recreate it (log entries, transactions, appointments, etc.).
How can I get the next unique AUTO_INCREMENT value (based on the new database value), then use that to update the desired row?
For example, in the above dataset, I want to change the first record to go to "database #2". Whatever query I make needs to make the data change to this:
-----------------------------
| id | cid | database |
-----------------------------
| 1 | 3 | 2 |
| 2 | 1 | 2 |
| 3 | 2 | 2 |
-----------------------------
I'm not sure if the AUTO_INCREMENT needs to be incremented, as my understanding is that the unique key makes it just calculate the next appropriate value on the fly.
I actually ended up making it work once I re-read an except on using AUTO_INCREMENT on multiple columns.
For MyISAM and BDB tables you can specify AUTO_INCREMENT on a
secondary column in a multiple-column index. In this case, the
generated value for the AUTO_INCREMENT column is calculated as
MAX(auto_increment_column) + 1 WHERE prefix=given-prefix. This is
useful when you want to put data into ordered groups.
This was the clue I needed. I simply mimic'd the query MySQL runs internally according to that quote, and joined it into my UPDATE query as such. Assume $new_database is the database to move to, and $id is the current ticket id.
UPDATE `tickets` AS t1,
(
SELECT MAX(cid) + 1 AS new_cid
FROM `tickets`
WHERE database = {$new_database}
) AS t2
SET t1.cid = t2.new_cid,
t1.database = {$new_database}
WHERE t1.id = {$id}

MySQL DATETIME functions beauty vs perfomance (speed)

How much faster (in %) sql will be if I will avoid to used built-in mysql date and time functions ?
What do I mean ? For example: SELECT id FROM table WHERE WEEKOFYEAR(inserted)=WEEKOFYEAR(CURDATE())
MySQL has a lot of buil-in function to work with date and time, and they are suitable as well. But what about peromance ?
Above sql can be rewritten without built-in functions, like: SELECT id FROM table WHERE inserted BETWEEN 'date for 1 day of particular week 00:00:00' AND 'last day of particular week 23:59:59', server side code become worse :( but on db side we could use indexes
I see two problems for usage built-in functions:
1. indexes
I did small test
mysql> explain extended select id from table where inserted between '2013-07-01 00:00:00' and '2013-07-01 23:59:59';
+----+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | table | range | ins | ins | 4 | NULL | 7 | 100.00 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+
mysql> explain extended select id from table where date(inserted)=curdate();
+----+-------------+-------+-------+---------------+------+---------+------+--------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+--------+----------+--------------------------+
| 1 | SIMPLE | table | index | NULL | ins | 4 | NULL | 284108 | 100.00 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+--------+----------+--------------------------+
First one took 0.00 sec second one was running after first one and took 0.15. Everything was made with small anout of data.
and second problem, is
time to call that functions
If in table I have 1 billion records it means that WEEKOFYEAR, DATE whatever... would be called so many times, so many records do we have, right ?
So the question will it bring real profit if I will stop to work with mysql built-in date and time functions ?
Using a function of a column in a WHERE clause or in a JOIN condition will prevent the use of indexes on the column(s), if such indexes exist. This is because the raw value of the column is indexed, as opposed to the computed value.
Notice the above does not apply for a query like this:
SELECT id FROM atable WHERE inserted = CURDATE(); -- the raw value of "inserted" is used in the comparison
And yes, on top of that, the function will be executed for each and every row scanned.
The second query is running the date function on every row in the table, while the first query can just use the index to find the rows it needs. Thats where the biggest slowdown would be. Look at the rows column in the explain output

PHP/MYSQL: Storing a list or massive table

I am still new to PHP and I was wondering which alternative would be better or maybe someone could suggest a better way.
I have a set of users and I have to track all of their interactions with posts. If a users taps on a button, it will add the post to a list and if they tap it again, it will remove the post, so would it be better to:
Have a column of a JSON array of postIDs stored in the table for each user (probably thousands).
-or-
Have a separate table with every save (combination of postID and userID) (probably millions) and return all results where the userID's match?
For the purposes of this question, there are two tables: Table A is users and Table B is posts. How should I store all of the user's saved posts?
EDIT: Sorry, but I didn't mention that posts will have multiple user interactions and users will have multiple post interactions (Many to Many relationship). I think that would affect Bob's answer.
This is an interesting question!
The solution really depends on your expected use case. If each user has a list of posts they've tagged, and that is all the information you need, it will be expedient to list these as a field in the user's table (or in their blob if you're using a nosql backend - a viable option if this is your use case!). There will be no impact on transmission time since the list will be the same size either way, but in this solution you will probably save on lookup time, since you're only using one table and dbs will optimize to keep this information close together.
On the other hand, if you have to be able to query a given post for all the users that have tagged it, then option two will be much better. In the former method, you'd have to query all users and see if each one had the post. In this option, you simply have to find all the relations and work from there. Presumably you'd have a user table, a post table and a user_post table with foreign keys to the first two tables. There are other ways to do this, but it necessitates maintaining multiple lists and cross checking each time, which is an expensive set of operations and error-prone.
Note that the latter option shouldn't choke on 'millions' of connections, since the db should be optimized for this sort of quick read. (pro tip: index the proper columns!) Do be careful about any data massage, though. One unnecessary for-loop will kill your performance.
For the purposes of this question, there are two tables: Table A is users and Table B is posts. How should I store all of the user's saved posts?
If each user has a unique ID of some sort (primary key), then ad a field to each post that refers to the unique ID of the user.
mysql> describe users;
+----------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| email | varchar(200) | YES | | NULL | |
| username | varchar(20) | YES | | NULL | |
+----------+------------------+------+-----+---------+----------------+
mysql> describe posts;
+---------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| user | int(11) unsigned | NO | | NULL | |
| text | text | YES | | NULL | |
+---------+------------------+------+-----+---------+----------------+
Then to get posts for a user, for example:
SELECT text
FROM posts
WHERE user=5;
Or to get all the posts from a particular organization:
SELECT posts.text,users.username
FROM posts,users
WHERE post.user=users.id
AND users.email LIKE '%#example.com';
I think it would make sense to keep a third table that would be all the post status data.
If your user interface shows, say, 50 posts per page, then the UI only needs to keep track of 50 posts at a time. They'll all have unique IDs in your database, so that shouldn't be a problem.

Searching MySQL data

I am trying to search MySQL database with a search key entered by the user. My data contain upper case and lower case. My question is how to make my search function not case sensitive. ex:data in mysql is BOOK but if the user enters book in search input. The result is not found....Thanks..
My search code
$searchKey=$_POST['searchKey'];
$searchKey=mysql_real_escape_string($searchKey);
$result=mysql_query("SELECT *
FROM product
WHERE product_name like '%$searchKey%' ORDER BY product_id
",$connection);
Just uppercase the search string and compare it to the uppercase field.
$searchKey= strtoupper($_POST['searchKey']);
$searchKey=mysql_real_escape_string($searchKey);
$result=mysql_query("SELECT * FROM product
WHERE UPPER(product_name) like '%$searchKey%' ORDER BY product_id
",$connection);
If possible, you should avoid using UPPER as a solution to this problem, as it incurs both the overhead of converting the value in each row to upper case, and the overhead of MySQL being unable to use any index that might be on that column.
If your data does not need to be stored in case-sensitive columns, then you should select the appropriate collation for the table or column. See my answer to how i can ignore the difference upper and lower case in search with mysql for an example of how collation affects case sensitivity.
The following shows the EXPLAIN SELECT results from two queries. One uses UPPER, one doesn't:
DROP TABLE IF EXISTS `table_a`;
CREATE TABLE `table_a` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`value` varchar(255) DEFAULT NULL,
INDEX `value` (`value`),
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
INSERT INTO table_a (value) VALUES
('AAA'), ('BBB'), ('CCC'), ('DDD'),
('aaa'), ('bbb'), ('ccc'), ('ddd');
EXPLAIN SELECT id, value FROM table_a WHERE UPPER(value) = 'AAA';
+----+-------------+---------+-------+---------------+-------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+-------+---------+------+------+--------------------------+
| 1 | SIMPLE | table_a | index | NULL | value | 258 | NULL | 8 | Using where; Using index |
+----+-------------+---------+-------+---------------+-------+---------+------+------+--------------------------+
EXPLAIN SELECT id, value FROM table_a WHERE value = 'AAA';
+----+-------------+---------+------+---------------+-------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+-------+---------+-------+------+--------------------------+
| 1 | SIMPLE | table_a | ref | value | value | 258 | const | 2 | Using where; Using index |
+----+-------------+---------+------+---------------+-------+---------+-------+------+--------------------------+
Notice that the first SELECT which uses UPPER has to scan all the rows, whereas the second only needs to scan two - the two that match. On a table this size, the difference is obviously imperceptible, but with a large table, a full table scan can seriously impact the speed of your query.
This is an easy way to do it:
$searchKey=strtoupper($searchKey);
SELECT *
FROM product
WHERE UPPER(product_name) like '%$searchKey%' ORDER BY product_id
First of all, try to avoid using * as much as possible. It is generally considered a bad idea. Select the columns using column names.
Now, your solution would be -
$searchKey=strtoupper($_POST['searchKey']);
$searchKey=mysql_real_escape_string($searchKey);
$result=mysql_query("SELECT product_name,
// your other columns
FROM product
WHERE UPPER(product_name) like '%$searchKey%' ORDER BY product_id
",$connection);
EDIT
I will try to explain why it is a bad idea to use *. Suppose you need to change the schema of the product table(adding/deleting columns). Then, the columns that are being selected through this query will change, which may cause unintended side effects and will be hard to detect.
According to the MySQL manual, case-sensitivity in searches depends on the collation used, and should be case-insensitive by default for non binary fields.
Make sure you have the field types and the query right (maybe there's an extra space or something). If that doesn't work, you can convert the string to upper case in PHP (ie: $str = strtoupper($str)) and do the same on the MySQL side (#despart)
EDIT: I posted the article above (^). AndI just tested it. Searches on CHAR, VARCHAR, and TEXT fields are case-insensitive (collation = latin1)

Categories