Best way to search on MySQL text columns

Best way to search on MySQL text columns - php

I have products stored in a MySQL database, it's a Wordpress website, however my data in stored in custom tables. I need to search for products and I'm currently facing some performance issues that I hope someone could help me or point me a way.
Since I receive a file (*.csv) once a day to update all my products (add, update or remove products), I have a process to read the file and populate/update tables. In this process, I add a step to filter data and replace any special character to "unpecial" characters (example: replace 'á' by 'a').
By now, I have a table (products_search) related to product's table (products) and built from it, I use this table to do searches. When the user search something, I modify the input to replace special characters, so the search would be direct on table.
The problem: searching in "text" columns is slow, even adding index on that column. I'm currently search like this:
select * from products_search
where description like %search_word_1%
or description like %search_word_2% ...
If I get a result, I will get the ID and relate to product table and get all info I might need to show to user.
Solution looked for: I'm looking for a way to search on products_search table with a better performance. The wordpress search engine, as I understand, work only on "posts" table. Is there any way to do a quicker search? Perhaps using a plugin or just change the way the search is doing.
Thanks to all

I think we need to revise the nightly loading in order to make the index creation more efficient.
I'm assuming:
The data in the CSV file replaces the existing data.
You are willing to use FULLTEXT for searching.
Then do:
CREATE TABLE new_data (...) ENGINE=InnoDB;
LOAD DATA INTO new_data ...;
Cleanse the data in new_data.
ALTER TABLE new_data ADD FULLTEXT(...); The column(s) to index here either exist, or are added during step 1 or 3.
RENAME TABLE real_data TO old_data, new_data TO real_data;
DROP TABLE old_data;
Note that this has essentially zero downtime for real_data so you can continue to do SELECTs.
You have not explained how you spray the single CSV file into wp_posts and wp_postmeta. That sounds like a nightmare buried inside my step 3.
FULLTEXT is immensely faster than futzing with wp_postmeta. (I don't know if there is an existing way or plugin to achieve such.)
With `FULLTEXT(description), your snippet of code would use
WHERE MATCH(description) AGAINST ('word1 word2' IN BOOLEAN MODE)
instead of the very slow LIKE with a leading wildcard.
If you must use wp_postmeta, I recommend https://wordpress.org/plugins/index-wp-mysql-for-speed/

Related

performance issue from 5 queries in one page

As i am a junior PHP Developer growing day by day stuck in a performance problem described here:
I am making a search engine in PHP ,my database has one table with 41 column and million's of rows obviously it is a very large dataset. In index.php i have a form for searching data.When user enters search keyword and hit submit the action is on search.php with results.The query is like this.
SELECT * FROM TABLE WHERE product_description LIKE '%mobile%' ORDER BY id ASC LIMIT 10
This is the first query.After result shows i have to run 4 other query like this:
SELECT DISTINCT(weight_u) as weight from TABLE WHERE product_description LIKE '%mobile%'
SELECT DISTINCT(country_unit) as country_unit from TABLE WHERE product_description LIKE '%mobile%'
SELECT DISTINCT(country) as country from TABLE WHERE product_description LIKE '%mobile%'
SELECT DISTINCT(hs_code) as hscode from TABLE WHERE product_description LIKE '%mobile%'
These queries are for FILTERS ,the problem is this when i submit search button ,all queries are running simultaneously at the cost of Performance issue,its very slow.
Is there any other method to fetch weight,country,country_unit,hs_code speeder or how can achieve it.
The same functionality is implemented here,Where the filter bar comes after table is filled with data,How i can achieve it .Please help
Full Functionality implemented here.
I have tried to explain my full problem ,if there is any mistake please let me know i will improve the question,i am also new to stackoverflow.

Firstly - are you sure this code is working as you expect it? The first query retrieves 10 records matching your search term. Those records might have duplicate weight_u, country_unit, country or hs_code values, so when you then execute the next 4 queries for your filter, it's entirely possible that you will get values back which are not in the first query, so the filter might not make sense.
if that's true, I would create the filter values in your client code (PHP)- finding the unique values in 10 records is going to be quick and easy, and reduces the number of database round trips.
Finally, the biggest improvement you can make is to use MySQL's fulltext searching features. The reason your app is slow is because your search terms cannot use an index - you're wild-carding the start as well as the end. It's like searching the phonebook for people whose name contains "ishra" - you have to look at every record to check for a match. Fulltext search indexes are designed for this - they also help with fuzzy matching.

I'll give you some tips that will show useful in many situations when querying a large dataset, or mostly any dataset.
If you can list the fields you want instead of querying for '*' is a better practice. The weight of this increases as you have more columns and more rows.
Always try to use the PK's to look for the data. The more specific the filter, the less it will cost.
An index in this kind of situation would come pretty handy, as it will make the search more agile.
LIKE queries are generally pretty slow and resource heavy, and more in your situation. So again, the more specific you are, the better it will get.
Also add, that if you just want to retrieve data from this tables again and again, maybe a VIEW would fit nicely.
Those are just some tips that came to my mind to ease your problem.
Hope it helps.

Matching a user entered title to a category - large INNODB database

I have a large INNODB database with over 2 million products on it. The 'products' table has the following fields: id,title,description,category.
There is also a MyISAM table called 'category' that contains a list of all categories used on the website. This has the following fields: id,name,keywords,parentid.
My question is more about the logic rather than code, but what I am trying to achieve is as follows:
When a user lists a new product on the site, as they are typing the description it should try to work out what category to put the product in (with good accuracy).
I tried this initially by using MySQL MATCH() to match the entered title against a list of keywords in the category table, but this was far from accurate.
A better idea seems to be to match the user entered title against titles for products already in the database, grouping them by the category they are in and then sorting them by the largest group. However, on an INNODB database I obviously can't use fulltext, and with 2mill items I think it would be pretty slow anyway?
How would you do it - I guess it would need to be a similar way to how stackoverflow displays similar questions?

A fulltext index on 2 million records is a valid option, if you are running on a decent server. The inital indexing will take a while, that's for sure, but searches should be reasonably fast, MySQL can take it.
InnoDB supports fulltext indexes as of v5.6.4. You should consider upgrading.
If upgrading is not an option, please see this previous answer of mine where I suggest a workaround.
For your use case, you may want to take a look at the WITH QUERY EXPANSION option:
It works by performing the search twice, where the search phrase for the second search is the original search phrase concatenated with the few most highly relevant documents from the first search. Thus, if one of these documents contains the word “databases” and the word “MySQL”, the second search finds the documents that contain the word “MySQL” even if they do not contain the word “database”

PHP sort an array by lastname separated by whitespace in modx

I have a mysql table that looks like this:
id author public image1 image2 image3 bio media1 media2 media3 media4 media5 media6
The Field "author" normaly has Firstname (Secondname) Lastname seperated by whitespaces.
How can I sort the array after the Lastname and if just one name is present after this one.
This is the modx query I use to sort after the author but obviously it doesn't use the lastname.
$c = $modx->newQuery('AuthorDe');
$c->sortby('author','ASC');
$authors = $modx->getCollection('AuthorDe',$c);

You're shooting yourself in the foot right now, for a couple of reasons:
When there is only one word in the string, the sorting is hard to predict.
You have indexes for your data for a reason. They make it a lot faster. Using string functions force a table scan. Good enough for 100 data units, slow for 10000 rows and 'database went for a vacation" at 1000000.
Next time you have to use the author field and you realize you have to split it up to words you also have to understand and fix this code snippet on top of the old ones.
That said - I haven't tested it - but try this:
$c->sortby('substring_index(author," ",-1)','ASC');

So to elaborate on the very valuable point jous made, putting multiple points of data in one database column is counter productive.
The sorting you want to do would be simple, fast, & efficient, in a sql query (using the same construct jous showed but without the string operation).
To modify this table you would simply add the following columns to your table in place of author:
firstname
lastname
middlename
To show you how simple this is (and make it even easier) here's the code to do it:
ALTER TABLE [tablename]
ADD COLUMN firstname varchar(32)
ADD COLUMN lastname varchar(32)
ADD COLUMN middlename varchar(32)
DROP COLUMN author;
Then the modx PHP code would be:
$c->sortby('lastname','ASC');
So this is fairly easily done... and if you still need to support other references to author then create a view that returns author in the same way the un-altered table did as shown below (NOTE: you would still have to change the table name reference so it points to the view instead of the table... if this will be a big problem then rename the table and name the view the same as the old table was...):
CREATE VIEW oldtablename AS
SELECT firstname+' '+middlename+' '+lastname' ' AS author
FROM newtablename;
NOTE: if you do create a view like the above then it is probably worth your while to add all of the other columns from the new table (the multiple image & media columns).
NOTE2: I will add, however, that those would ideally be in separate tables with a join table to this one... but if I were in your spot I might agree that expedience might beat utility & future usability.... however if you did put them in different tables you could add those tables to this view (as joins to the new table) and still be able to support existing code that depends on the old table & it's structure.
While the above is all fairly easily done and will work with minor adjustments from you the last part of this is getting your custom table changes to be reflected by xPDO. If you are already comfortable with this and know what to do then great.
If you aren't this is by far the best article on the topic: http://bobsguides.com/custom-db-tables.html
(Yes it is worth getting Bob's code as a snippet so all of this can simply be generated for you once the database changes have been made... (remember you will likely need to delete the existing schema file & xpdo related class files & map files before you run Bob's generation code, or your changes that have the same table name, like the view, won't take effect).
Hope this helps you (or the next person to ask a similar question).

Pattern Matching in SQL Issue -- Finding the Right Query with PHP

I'm in need of some quick help on matching a field in my database that stores all of the "parent" categories for my online store. Here's an example of how my "parents" are stored in the table via one field named Parent:
MENS MENS-BRANDS MENS-SHIRTS MENS-T-SHIRTS
Here is my query in PHP to perform the call:
$query = "SELECT id FROM $usertable where parent like '".strtoupper($parent)."'";
The problem is, if I am on MENS-BRANDS, this will also return those products who are listed in every other category because it contains the word "MENS." Since all of the parents are stored in one field, how can I make my SQL query only recognize each physical word that is separated by spaces in the field itself, instead of it trying to find every instance of different fragments of a word throughout the field?
I hope this makes sense, and any help is surely appreciated.

Ideally you can change your schema so that you have a separate table linking these categories to your existing entries. This way you can have one row per product and you can easily write a SQL query that looks for the specific word you want without the need for a LIKE match. Added bonus: this will improve performance.
However, if you absolutely cannot change this schema, your best bet is probably to use a regular expression like WHERE parent REGEXP '[[:<:]]MENS[[:>:]]'
I'm here using MySQL regular expressions. If you're using a different database management system the same concept will work, but the exact syntax may be different.

Count line breaks in a field and order by

I have a field in a table recipes that has been inserted using mysql_real_escape_string, I want to count the number of line breaks in that field and order the records using this number.
p.s. the field is called Ingredients.
Thanks everyone

This would do it:
SELECT *, LENGTH(Ingredients) - LENGTH(REPLACE(Ingredients, '\n', '')) as Count
FROM Recipes
ORDER BY Count DESC
The way I am getting the amount of linebreaks is a bit of a hack, however, and I don't think there's a better way. I would recommend keeping a column that has the amount of linebreaks if performance is a huge issue. For medium-sized data sets, though, I think the above should be fine.
If you wanted to have a cache column as described above, you would do:
UPDATE
Recipes
SET
IngredientAmount = LENGTH(Ingredients) - LENGTH(REPLACE(Ingredients, '\n', ''))
After that, whenever you are updating/inserting a new row, you could calculate the amounts (probably with PHP) and fill in this column before-hand. Or, if you're into that sort of thing, try out triggers.

I'm assuming a lot here, but from what I'm reading in your post, you could change your database structure a little bit, and both solve this problem and open your dataset up to more interesting uses.
If you separate ingredients into its own table, and use a linking table to index which ingredients occur in which recipes, it'll be much easier to be creative with data manipulation. It becomes easier to count ingredients per recipe, to find similarities in recipes, to search for recipes containing sets of ingredients, etc. also your data would be more normalized and smaller. (storing one global list of all ingredients vs. storing a set for each recipe)
If you're using a single text entry field to enter ingredients for a recipe now, you could do something like break up that input by lines and use each line as an ingredient when saving to the database. You can use something like PHP's built-in levenshtein() or similar_text() functions to deal with misspelled ingredient names and keep the data as normalized as possbile without having to hand-groom your [users'] data entry too much.
This is just a suggestion, take it as you like.

You're going a bit beyond the capabilities and intent of SQL here. You could write a stored procedure to scan the string and return the number and then use this in your query.
However, I think you should revisit the design of whatever is inserting the Ingredients so that you avoid searching strings in of every row whenever you do this query. Add a 'num_linebreaks' column, calculate the number of line breaks and set this column when you're adding the Indgredients.
If you've no control over the app that's doing the insertion, then you could use a stored procedure to update num_linebreaks based on a trigger.

Got it thanks, the php code looks like:
$check = explode("\r\n", $_POST['ingredients']);
$lines = count($check);
So how could I update all the information in the table so Ingred_count based on field Ingredients in one fellow swoop for previous records?

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Best way to search on MySQL text columns - php

Related

performance issue from 5 queries in one page

Matching a user entered title to a category - large INNODB database

PHP sort an array by lastname separated by whitespace in modx

Pattern Matching in SQL Issue -- Finding the Right Query with PHP

Count line breaks in a field and order by

Categories

Resources