MySQL sort by number of occurrences - php

I am doing a search in two text fields called Subject and Text for a specific keyword. To do this I use the LIKE statement. I have encountered a problem when trying to sort the results by the number of occurrences.
my search query looks like this:
SELECT * FROM Table WHERE (Text LIKE '%Keyword%' OR Subject LIKE '%Keyword%')
I tried to add a count() statement and sort it by the number of occurrences, but the count() statement just keep returning the number of rows in my table.
Here is the query with count statement:
SELECT *, COUNT(Text LIKE '%Keyword%') AS cnt FROM News WHERE (Text LIKE '%Keyword%' OR Subject LIKE '%Keyword%') ORDER BY cnt
What im looking for is something that returns the number of matches on the Subject and Text columns on each row, and then order the result after the highest amount of occurrences of the keyword on each row.

Below query can give you the no.of occurrences of string appears in both columns i.e text and subject and will sort results by the criteria but this will not be a good solution performance wise its better to sort the results in your application code level
SELECT *,
(LENGTH(`Text`) - LENGTH(REPLACE(`Text`, 'Keyword', ''))) / LENGTH('Keyword')
+
(LENGTH(`Subject`) - LENGTH(REPLACE(`Subject`, 'Keyword', ''))) / LENGTH('Keyword') `occurences`
FROM
`Table`
WHERE (Text LIKE '%Keyword%' OR Subject LIKE '%Keyword%')
ORDER BY `occurences` DESC
Fiddle Demo
Suggested by #lserni a more cleaner way of calculation of occurrences
SELECT *,
(LENGTH(`Text`) - LENGTH(REPLACE(`Text`, 'test', ''))) / LENGTH('test') `appears_in_text`,
(LENGTH(`Subject`) - LENGTH(REPLACE(`Subject`, 'test', ''))) / LENGTH('test') `appears_in_subject`,
(LENGTH(CONCAT(`Text`,' ',`Subject`)) - LENGTH(REPLACE(CONCAT(`Text`,' ',`Subject`), 'test', ''))) / LENGTH('test') `occurences`
FROM
`Table1`
WHERE (TEXT LIKE '%test%' OR SUBJECT LIKE '%test%')
ORDER BY `occurences` DESC
Fiddle Demo 2

You want SUM instead. Count will count how many records have non-null values, which means ALL matches and NON-matches will be counted.
SELECT *, SUM(Text LIKE '%Keyword') AS total_matches
...
ORDER BY total_matches
SUM() will count up how many boolean true results the LIKE produces, which will be typecast to integers, so you get a result like 1+1+1+0+1 = 4, instead of the 5 non-nulls count.

// escape $keyword for mysql
$keyword = strtolower('Keyword');
// now build the query
$query = <<<SQL
SELECT *,
((LENGTH(`Subject`) - LENGTH(REPLACE(LOWER(`Subject`), '{$keyword}', ''))) / LENGTH('{$keyword}')) AS `CountInSubject`,
((LENGTH(`Text`) - LENGTH(REPLACE(LOWER(`Text`), '{$keyword}', ''))) / LENGTH('{$keyword}')) AS `CountInText`
FROM `News`
WHERE (`Text` LIKE '%{$keyword}%' OR `Subject` LIKE '%{$keyword}%')
ORDER BY (`CountInSubject` + `CountInText`) DESC;
SQL;
Returns number of occurrences in each field and sorts by that.
The 'keyword' needs to be lower cased for this to work. I don't think it's really fast, performance wise as it needs to lower-case fields and there's no case-insensitive search in MySQL afaik.
You could index each news item (subject and text) by words and store in another table with news_id and occurrence count and then match against that.

Related

How to group comma separated values into distinct sorted comma separated values

I’m trying to work out a piece of sql, and wondering if what I am attempting to do is possible, and if so how would I go about doing it? My goal is, if possible to return a thread id with all the unique values that are associated with that thread ID. In the example below, i show how far i have gotten and where I am attempt to get to.
$sql = SELECT `ThreadID`, `PostTags` FROM `ma_posts` WHERE `PostTags` IS NOT NULL AND `PostTags` != '0' AND `PostTags` != ' ' ORDER BY `ThreadID`
RETURNS CURRENT RESULT
WANT TO RETURN A FILTERED GROUPED RESULT OF DISTINCT VALUES SORTED ALPHABETICALLY
In the above example, the second instance of the name ‘Jon’ is removed from the names returned for ThreadID 14, and all of the names are condensed into one string/bit of data. The second and third instances of the name Barry and Mary are removed from the condensed ThreadID 20.
Is what I am attempting possible? If so how?
I’ve looked over Stackoverflow and the web and I’ve not found an example that really speaks to what I’m trying to do. Likely (hopefully) this is possible so i won’t then have to filter the coming in results via PHP. This will make life much simpler.
Thanks for any help as always folks
Cheers from the Monkee
You should really consider normalizing your schema. The solution will then be trivial.
For now, you could try to convert the comma separated values into separate rows and then apply combination of UPPER, LOWER and SUBSTR to convert string's first character to upper case and rest to lower case (to convert "barry" into "Barry"). Finally apply GROUP_CONCAT with distinct and order by option to get the desired result.
Try this:
SELECT
threadid,
GROUP_CONCAT(DISTINCT posttag
ORDER BY posttag
SEPARATOR ', ') posttags
FROM
(SELECT
threadid,
CONCAT(UPPER(SUBSTR(posttag, 1, 1)), LOWER(SUBSTR(posttag, 2))) posttag
FROM
(SELECT
t.threadid,
SUBSTRING_INDEX(SUBSTRING_INDEX(t.posttags, ', ', t2.i), ', ', - 1) posttag
FROM
t
JOIN (SELECT 1 i UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) t2
ON CHAR_LENGTH(t.posttags) - CHAR_LENGTH(REPLACE(t.posttags, ', ', '')) >= 2 * (t2.i - 1)) t) t
GROUP BY threadid;
You can always throw more UNION ALL in the subquery if there can be more than 5 values in posttags.
Live Demo
It seems like your database is not normalized. You need to normalized it into 3rd normal form.
Anyhow you cant distinct word in your query, which will select unique values.
$sql = SELECT distinct `ThreadID`, `PostTags` FROM `ma_posts` WHERE `PostTags` IS NOT NULL
AND `PostTags` != '0' AND `PostTags` != ' ' ORDER BY `ThreadID`

How to set precedence of condition in MYSQL when two conditions are given using OR?

I have the following code:
$sqlquery1 = "SELECT *
FROM item
WHERE item.title LIKE '%abc%'
OR item.idescription LIKE '%abc%'";
and this is running perfectly.
The only concern is that I want to show the results of the first OR condition (item.title) at first preceded by the next condition.
Eg of current output:
Title: ALGEBRA
Description: abc inc
Title: ABC
Description: dsdsinc
Title: AlABCad
Description: sds inc
Eg of Desired output:
Title: ABC
Description: dsdsinc
Title: AlABCad
Description: sds inc
Title: ALGEBRA
Description: abc inc
If you want rows returned in a specific order, add an ORDER BY clause.
To meet the stated specification, this would be sufficient:
ORDER BY item.title LIKE '%abc%' DESC
This works, because MySQL evaluates a boolean expression and returns the result as integer: TRUE is returned as 1, FALSE is returned as 0 and NULL or unknown is returned as NULL. We can demonstrate this with a simple test, e.g.
SELECT 'abc' LIKE '%abc%'
, 'foo' LIKE '%abc%'
, NULL LIKE '%abc%'
And see that MySQL returns 1, 0 and NULL respectively.
Personally, I would prefer more expressions in the ORDER BY to make the result more deterministic, so that the rows are guaranteed to be returned in a predictable and repeatable order.
You can assign a value to each match condition and order by that. When you do this, it's good to have a "tie-breaker" order so you don't get all the title matches randomly ordered followed by all of the description matches randomly ordered:
SELECT *
FROM item
WHERE item.title LIKE '%abc%' OR item.idescription LIKE '%abc%'
ORDER BY
CASE(WHEN item.title like '%abc%' THEN 1 ELSE 0 END),
item.title,
item.description;
This is similar to spencer7593's solution except it can be expanded to handle more than two "OR" conditions if needed.
Expanding on the other answers, you may want to rank items where the term occurs in both the title and description higher then just the ones with term just in the title.
SELECT * , title_score + description_score AS total_score
FROM (
SELECT * , title LIKE '%abc%' AS title_score, description LIKE '%abc%' AS description_score
FROM item
WHERE title LIKE '%abc%'
OR description LIKE '%abc%'
) AS temp_tbl
ORDER BY total_score DESC, title_score DESC, description_score DESC
In addition to these answers you can use a FULLTEXT index with a score result which will rank your search based on the number of times the term is found.
SELECT * , MATCH(title, description) AGAINST ('*abc*') AS score
FROM item
WHERE MATCH(title, description) AGAINST ('*abc*' IN BOOLEAN MODE)
ORDER BY score DESC

MySql query challenge - return results with whitespace?

I have a MySql database with some rows as follows:
ID DESC
1 This is my bike
2 Motorbikes are great
3 All bikers should wear helmets
4 A bike is great for exercise
5 A. Top. Bike.
What I want to do is return the rows with whitespace surrounding the search term, OR the term being at the end or beginning of the description.
For example,
"SELECT * FROM `mytable` WHERE `desc` LIKE '%bike%'"
Will return all rows. But,
"SELECT * FROM `mytable` WHERE `desc` LIKE '% bike %'
Will only return row 4.
What I really want is a reliable way to return rows 1, 4 and 5, i.e. where the search term is sorrounded with anything BUT chars A-z, 0-9. Any ideas? Is this even possible with MySql?
Thanks!!
You can use regular expressions in SQL
SELECT * FROM `table` WHERE desc REGEXP '\bbike\b'
You should start reading about MySql RegEx.
Sample Code.
SELECT * FROM table WHERE field_name REGEXP PATTERN;
More Specific
details Table
ID NAME
1 Dipesh
2 Dip
3 Dipe
4 DiDi
5 Di
SELECT * FROM details WHERE NAME REGEXP '^Di$';
Result
NAME -> Di
SELECT * FROM details WHERE NAME REGEXP 'Di$';
Result
NAME -> DiDi , Di
SELECT * FROM details WHERE NAME REGEXP '^Di';
Result
NAME -> Dip, DiDi, Di
You need to specify the additional conditions in the query:
SELECT *
FROM `mytable`
WHERE
`desc` LIKE '% bike %' OR
`desc` LIKE '% bike' OR
`desc` LIKE 'bike %';
Try this one, hope it'll help you
"SELECT * FROM `mytable` WHERE `desc` LIKE '% bike'

MySQL: Query for two values' presence

I have a table with fields id, text, and title. There may be multiple rows with the same title (title is not unique). Additionally, id is the primary key.
I need only to know if there are at least one row with title="A" that contains the string "aaa" in text, and additionally at least one row with title="B" that contains the string "bbb" in text.
This is what I tried:
mysql> SELECT (SELECT text FROM table WHERE title = 'A') as aText, (SELECT text FROM table WHERE title = 'B') as bText;
I had planned on parsing the values of aText and bText for the strings "aaa" and "bbb" respectfully with PHP. However, I have two issues:
1) Major issue: Due to title not being unique, the subqueries may return multiple rows. MySQL breaks on that happening with this error:
ERROR 1242 (21000): Subquery returns more than 1 row
2) Minor issue: The reason that I am parsing in PHP is to avoid using MySQL's LIKE operator. Would I be better off doing the parsing of the string right there in MySQL as such:
mysql> SELECT (SELECT text FROM table WHERE title = 'A' AND text LIKE '%aaa%') as aText, (SELECT text FROM table WHERE title = 'B' AND text LIKE '%bbb%') as bText;
SELECT (SELECT count(*) FROM table WHERE title='A') AS A_count,
(SELECT count(*) FROM table WHERE title='B') AS B_count
WHERE (A_count > 0) AND (B_count > 0)
this would return 1 row (with the counts) and no rows if either'A' or 'B' is not present.
A subquery returning data to a field in a parent query can only ever return a single value. You're basically replacing a fieldname with a query result, which means the query result has to behave the same as a normal table field would. Hence the "returns more than 1 row" error - you're returning all of the rows where 'A' or 'B' matches, which is not a single field - it's a column.
If you just wanted to know if those rows existed, you could do something like this:
SELECT COUNT(1)
FROM table
WHERE (title = 'A' AND text = '%aaa%')
OR (title = 'B' AND text = '%bbb%')
-- EDIT --
Based on your comment, you may want to try the following query instead:
SELECT COUNT(1), title
FROM table
WHERE (title = 'A' AND text = '%aaa%')
OR (title = 'B' AND text = '%bbb%')
GROUP BY title
Check for two rows, both with a COUNT(1) of greater than zero.
You can self-join:
SELECT t1.text as `atext`, t2.text as `btext`
table as t1, table as t2
WHERE t1.title = 'A' AND t1.text LIKE '%aaa%' AND t2.title = 'B' AND t2.text LIKE '%bbb%';
If you want to do your parsing in php just leave off the where statement, but if what you're doing in php is equivalent to mysql's like statement I don't know why you wouldn't do it in the query.
It is probably not good form to answer one's own question, but for the sake of the fine archives at the expense of my karma:
The query that I employed is:
SELECT count(*) FROM (
SELECT DISTINCT title FROM (
SELECT title, text FROM table WHERE title IN ('A', 'B')
)
AS filtered
WHERE (title='A' AND text LIKE '%aaa%')
OR (title='B' AND text LIKE '%bbb%')
)
AS allResults;
The innermost query gets everything with a good title, the next query gets distinct titles for the real results. This setup ensures that the LIKE clause will not run on the entire table. I then wrap all that in a count, and if if equals 2 then I know that each condition was met.
Thank you everyone who contributed, I gained quite some insight from your answers!

mysql: order by match

for looking up matching keywords in mysql i use
SELECT * FROM `test` WHERE `keywords` REGEXP '.*(word1|word2|word3).*' LIMIT 1
I want to order them by the most matching keywords in the keywords column to give the best answer.For example
Keywords /////////////// Response
word1,word2 /////////// test1
word1,word2,word3 / test2
I want the response to be test2 with the query given.
How can i order the results my the most matching keywords?
SELECT
(keywords REGEXP '.*(word1).*')
+(keywords REGEXP '.*(word2).*')
+(keywords REGEXP '.*(word3).*') as number_of_matches
,keywords
,field1
,field2
FROM test
WHERE keywords REGEXP '.*(word1|word2|word3).*'
ORDER BY number_of_matches DESC
LIMIT 20 OFFSET 0

Categories