I have a search engine based site that is currently in beta mode http://www.jobportfolio.co.uk. The site has a job table that incorporates the following fields, (job_company, job_title, job_description, job_location) all the fields are Var except for description that is a text field. All the fields are indexed as FullText.
My current approach is to search based on the title, location and company. This seems to work fine however I would like to improve the search results by adding in the description field. The problem is however when I add the description field the search seems to take a lot longer. Even with a table that only contains 12000 rows it seems to be slow.
I am using the following MATCH AGAINST query to select the results
MATCH(job_posts.job_title, job_company) AGAINST('".$this->mysqli_escape($job_title)."' IN BOOLEAN MODE)
Does anyone have any opinions on how to improve the performance of the search?
Hm, my first thought is to approach this problem from the "outside": is it acceptable to have a search form that uses multiple different fields? If you're willing to have 4 search strings that each search in a different column, I suspect that will reduce load by itself. For example:
When someone types in the "location" field, you add a clause to the query that matches the searched text against the location field only.
When someone types in the "description" field, you add a clause to the query that matches the search text against the description field. Otherwise you don't match anything against the description field.
If you don't need to be able to enter text into one place and search "all possible fields" for it, this solution will prevent extra slowness until someone specifically wants to search in the description text. So the query speed varies based on the searcher's needs.
Related
I have products stored in a MySQL database, it's a Wordpress website, however my data in stored in custom tables. I need to search for products and I'm currently facing some performance issues that I hope someone could help me or point me a way.
Since I receive a file (*.csv) once a day to update all my products (add, update or remove products), I have a process to read the file and populate/update tables. In this process, I add a step to filter data and replace any special character to "unpecial" characters (example: replace 'á' by 'a').
By now, I have a table (products_search) related to product's table (products) and built from it, I use this table to do searches. When the user search something, I modify the input to replace special characters, so the search would be direct on table.
The problem: searching in "text" columns is slow, even adding index on that column. I'm currently search like this:
select * from products_search
where description like %search_word_1%
or description like %search_word_2% ...
If I get a result, I will get the ID and relate to product table and get all info I might need to show to user.
Solution looked for: I'm looking for a way to search on products_search table with a better performance. The wordpress search engine, as I understand, work only on "posts" table. Is there any way to do a quicker search? Perhaps using a plugin or just change the way the search is doing.
Thanks to all
I think we need to revise the nightly loading in order to make the index creation more efficient.
I'm assuming:
The data in the CSV file replaces the existing data.
You are willing to use FULLTEXT for searching.
Then do:
CREATE TABLE new_data (...) ENGINE=InnoDB;
LOAD DATA INTO new_data ...;
Cleanse the data in new_data.
ALTER TABLE new_data ADD FULLTEXT(...); The column(s) to index here either exist, or are added during step 1 or 3.
RENAME TABLE real_data TO old_data, new_data TO real_data;
DROP TABLE old_data;
Note that this has essentially zero downtime for real_data so you can continue to do SELECTs.
You have not explained how you spray the single CSV file into wp_posts and wp_postmeta. That sounds like a nightmare buried inside my step 3.
FULLTEXT is immensely faster than futzing with wp_postmeta. (I don't know if there is an existing way or plugin to achieve such.)
With `FULLTEXT(description), your snippet of code would use
WHERE MATCH(description) AGAINST ('word1 word2' IN BOOLEAN MODE)
instead of the very slow LIKE with a leading wildcard.
If you must use wp_postmeta, I recommend https://wordpress.org/plugins/index-wp-mysql-for-speed/
I am supporting a public blog to which users could publish their posts. Some users have more than thousand different texts and they might not remember, that they have already published some text. I would like to help users not to publish duplicates.
Comparing texts for exact equality is not good - user might have changed text a little, or formatting, or copied from a different program, etc. So I need a quick estimate, if there is a similar text in existing database.
My technology stack includes PHP, MySQL and Redis. How can I solve my problem using those or other instruments?
PHP has a function called similar_text which you can use to calculate the amount of matching characters or the similarity in percent.
http://php.net/manual/en/function.similar-text.php
You could then check if the given text is within a certain margin of older blog posts.
If you don't want to check for similarity in text you could try to tag the posts based on tags of the original blog or subject of the blog. And then show the users the posts they made with similar tags.
You can use MySQL's match - against in a full text indexed column.
As an example:
SELECT table.*,
MATCH(userText) AGAINST ('this is user input') AS relevancy
FROM table
ORDER BY relevancy DESC;
So this will give you results ordered by relevancy.
Don't forget to add full text index on column userText.
I have a large INNODB database with over 2 million products on it. The 'products' table has the following fields: id,title,description,category.
There is also a MyISAM table called 'category' that contains a list of all categories used on the website. This has the following fields: id,name,keywords,parentid.
My question is more about the logic rather than code, but what I am trying to achieve is as follows:
When a user lists a new product on the site, as they are typing the description it should try to work out what category to put the product in (with good accuracy).
I tried this initially by using MySQL MATCH() to match the entered title against a list of keywords in the category table, but this was far from accurate.
A better idea seems to be to match the user entered title against titles for products already in the database, grouping them by the category they are in and then sorting them by the largest group. However, on an INNODB database I obviously can't use fulltext, and with 2mill items I think it would be pretty slow anyway?
How would you do it - I guess it would need to be a similar way to how stackoverflow displays similar questions?
A fulltext index on 2 million records is a valid option, if you are running on a decent server. The inital indexing will take a while, that's for sure, but searches should be reasonably fast, MySQL can take it.
InnoDB supports fulltext indexes as of v5.6.4. You should consider upgrading.
If upgrading is not an option, please see this previous answer of mine where I suggest a workaround.
For your use case, you may want to take a look at the WITH QUERY EXPANSION option:
It works by performing the search twice, where the search phrase for the second search is the original search phrase concatenated with the few most highly relevant documents from the first search. Thus, if one of these documents contains the word “databases” and the word “MySQL”, the second search finds the documents that contain the word “MySQL” even if they do not contain the word “database”
could someone please point me in the right direction, I currently have a searchable database and ran into the problem of searching by title.
If the title begins with "The" then obviously the title will be in the 'T' section, what is a good way to avoid "The" being searched ? Should i concat two fields to display the title but search by only the second title ignoring the prefix. or is there another way to do this? Advice or direction would be great. thanks.
A few choices:
a) Store the title in "Library" format, which means you process the title and store it as
Scarlet Pimpernel, The
Tale of Two Cities, A
b) Store the original unchanged title for display purposes, and add a new "library_title" field to store the processed version from a).
c) Add a new field to store the articles, and the bare title in title field. For display, you'd concatenate the two fields, for searching you'd just look in the title field.
I believe the best approach is to use full-text search, with 'the' in the stopwords list. That would solve the search problem (i.e., 'the' on search phrases would be ignored).
However, if you are ordering the results by title, a title starting with 'The' would still be sorted, "in the 'T' section", as you put it. To solve that, there are several possible approaches. Here are some of them:
Separating the fields, the way you said on the quesiton
Having a separate field with the number of chars to be ignored from the beginning when sorting
Replacing initial 'The's for sorting
Among others...
If you are using mysql, you could use a str_replace function to remove "The" from your query, or if you are using PHP or Ruby or another language you can just sanitize your query before sending to the database server.
Create three columns in the database
1) TitlePrefix
2) Title
3) TitlePostfix
Code such that you have 4 methods like
searchTitleOnly(testToSearch) // search only title column
searchTitleWithPrefixAndPostfix(testToSearch)//concat all the three columns and search
searchTitlePrefix(testToSearch) // search title prefix only
searchTitlePostfix(testToSearch) // search title postfix only
Try looking into some sql functions like LTRIM, RTRIM etc and use these functions on a temp column which has exact same data. Modify the data by using LTRIM, RTRIM by dropping whichever words u please. Then perform the search on the modified column and return the entire row as the result!
In my web application there will be several users. and they have their own contents uploaded to my webapp. For each content they upload it has a title, description and tags(keywords). I can write a search script to search for content or user name. but they keywords when they have given with a spelling mistake it doesn't return any result. For example if there is a user named "Michael" in the database and the search query was "Micheal" i should get "Did you mean to search for 'Michael'" which is none other than a search suggestion.
Also this suggestion should be for the contents uploaded by the user. An user may keep their content's title as "Michael's activities May 2011" and suggestions should be generated for individual words.
You could use SOUNDEX to search for similar-sounding names, like that:
SELECT * FROM users WHERE SOUNDEX(name) = SOUNDEX(:input)
or like that
SELECT * FROM users WHERE name SOUNDS_LIKE :input
(which is completely equivalent)
Edit: if you need to use an algorithm other than Soundex, as Martin Hohenberg suggested, you would need to add an extra column to your table, called, for example, sound_equivalent. (This is actually a more efficient solution as this column can be indexed). The request would then be:
SELECT * FROM users WHERE sound_equivalent = :input_sound_equivalent
The content of the sound_equivalent column can then be generated with a PHP algorithm, and inserted in the table with the rest of user parameters.
You can also use the php library pspell to get suggestions if you have no search results.
Maybe create a database of the most common words (like: dog, house, city, numbers, water, internet). Don't need to make it big (<10000 words).
Then when you explode the search term, check the "word" database for words LIKE the search terms. Then just echo out the suggestions.