How to check data for duplication in database?

How to check data for duplication in database? - php

I have 3 fields of topic name in 3 different languages. When entering new topic I'm using JavaScript to show on the page if that topic exists already in the database by using select and like clause statement. The problem is it finds any duplication if the topic name entered exactly the same, but not if only some words are the same.
How do I check for duplication on topic that checks the words for similarity and not the whole topic?

SQL Server provides Full Text Search feature. You may try for this instead of your like clause.

ATaylor pretty much spells it out for you. To exclude the specific common keywords, I would recommend using an "EXCEPT SELECT WHERE LIKE" statement at the end of your statement.

SELECT DISTINCT(field) AS var1, COUNT(field) AS var2 FROM table GROUP BY field HAVING var2 > 1;
This would show up any field that exist twice or more in your current database

Full Text Search, you can have choice on you relevancy score. write a query to get the most relevant results.
SELECT MATCH('content') AGAINST ('keyword1
keyword2') as relevancy_score FROM topic WHERE MATCH
('Content') AGAINST('+keyword1 +keyword2' IN
BOOLEAN MODE) HAVING relevancy_score > 0.2 ORDER
BY relevancy_score DESC.
you may refer to this link.
http://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html

Related

Use like operator and order by sub query

Introduction
I have a project where we help the IT community to organize better IT meetups (just like meetup.com but only for IT community) called https://codotto.com/
We are developing a new feature where we would like to show the number of usages of certain tags. The algorithm should work just like Stackoverflow's one. You write javascript and you get a list with the tags that match your query sorted by the most used ones.
I'm currently using Laravel but I will post the raw query so that it's easier for mysql wizards to help me out if possible :)
Problem
I have the following tables
tags table
id name
1 javascript
2 javascript-tools
3 javascript-security
group_has_tags table
group_id tag_id
1 2
2 2
2 3
We have the tags javascript-tools being used two times, javascript-security is used one while javascript is not used at all.
Now, if a user search for javascript, he should get first javascript (because it is a direct match) followed by the rest of the tags sorted by their usage.
In Laravel this is the code that I have (simplified ofc)
$tags = Tag::withCount('groups')
->orderBy('groups_count', 'DESC')
->where('name', 'LIKE', 'javascript%')
->get(2);
The problem is that since I only return back 2 results, javascript is not being included in the results, even tho it's what the user literally wrote
For the mysql magicians, here is the raw query
select
`tags`.*,
(
select
count(*)
from
`meetups`
inner join `meetup_has_tags` on `meetups`.`id` = `meetup_has_tags`.`meetup_id`
where
`tags`.`id` = `meetup_has_tags`.`tag_id`
) as `meetups_count`
from
`tags`
where
`title` LIKE 'javascript%'
order by
`meetups_count` desc
limit
2 offset 0
Question
The main objective here is to return the most relevant result to the user. He writes javascript and javascript shows up first followed by less "relevant" results. The way I found was to sort by the number of times a tag was used.
Is there a solution where I can do "please, fetch the results that match this query first, then return the most relevant results"?
By "most relevant" results, I simply mean "what the user is looking for". If he writes "javascript" it should return "javascript" followed by "javascript-tools" (because "javascript-tools" was used twice but the user is literally searching for "javascript")

Here is your query:
SELECT * FROM
(SELECT tags.*,COALESCE((SELECT COUNT(*) FROM group_has_tags WHERE tag_id = tags.id),0) AS usage
FROM tags
WHERE title LIKE 'javascript%') AS tmp
ORDER BY tmp.name = 'javascript' DESC,usage DESC
For each matching tag you get the number of times it has been used.
Then you first sort by whether the tag matches literally what the user has typed, then by the usage.
Of course you will have to parameterize this query but I hope you get the idea.

Querying a table where the field contains any order of given text strings

I want to query a table as follows:
I have a field called "category" and my input match contains N separate words. I want the query to match all rows that contain all N words, but in any order.
For example if the field category contains "hello good morning world", my input query can contain "hello morning" or "good" or "world hello" and all are matches to the query.
How do I formulate such an SQL expression?
Also it would be good if the query can be made case insensitive.

If you are using MySQL you can use the boolean fulltext search feature to achieve this. You can put a + in front of each term and then only results with all the terms, in any order, will be returned. You will need to make sure the column containing the category field has a fulltext index specified on it for this to work. Other database engines probably have similar features. So for example you might do something like the following assuming there were a fulltext index over the category column...
SELECT * FROM myTable WHERE MATCH (category) AGAINST ('+term1 +term2 +term3' IN BOOLEAN MODE);
I would avoid using the "LIKE" operator as others have suggested you would have to worry about the headache of mixed upper/lower case and if you have a large database using a % in the front of a LIKE search term is going to cause a full table scan instead of using an index which is horrible for performance.

I'm not writing the loop that will build this query for you. This will get the job done, but it will be pretty inefficient.
SELECT * FROM table
WHERE (
TOUPPER(category) LIKE '*HELLO*' AND
TOUPPER(category) LIKE '*GOOD*' AND
TOUPPER(category) LIKE '*MORNING*' AND
TOUPPER(category) LIKE '*WORLD*'
);
You could also research using REGEXes with SQL.

php/mysql/ajax: google style search with suggestions

I have an ajax script that searches database tables for expressions similar to google search. The SELECT statement just uses LIKE and finds matches in the relevant fields. It worked fine at first but as content has grown, it is giving way too many matches for most search strings.
For example, if you search for att, you get att but also attention, attaboy, buratta etc.
Good search engines such as Google seem to have an intermediate table of suggestions that have been vetted by others. Rather than search the data directly, they seem to search the approved phrases such as AT&T and succeed in narrowing the number of results. Has anyone coded something like this and suggest the right dbase schema and query to get relevant results.
Right now I am searching table of say names directly with something like
$sql = "SELECT lastname from people WHERE lastname LIKE '%$searchstring%'";
I imagine besides people I should create some intermediate table along the lines of
people
id|firstname|lastname|description
niceterms
id|niceterm|peopleid
Then the query could be:
$sql = "SELECT p.lastname,p.peopleid, n.niceterm, n.peopleid,
FROM `people` p
LEFT JOIN `niceterms` n
on p.id = n.peopleid
WHERE niceterm LIKE '%$searchterm%'";
..so when you type something in the search box, you get nice search terms that will yield better results.
But how do I populate the niceterms table. Is this the right approach? I'm not trying to create a whole backweb or pagerank. Just want to narrow search results so they are relevant.
Thanks for any suggestions.

You might want to take a look at FULLTEXT search in Mysql. It allowes you to create powerfull query's based on relevance. You can for example create a BOOLEAN search which allowes you to create a scorerow in your result. The score will be based on rules like does the text start with a karakter combination (yes? +2, no but it does contain the combination: +1)
The below code is just another column and it has 3 rules in it:
Does the p1.name field contain Bl or rock? if yes -> add score
Does the p1.name field start with either Bl or rock? if yes -> add score
IS the p1.name equal to Bl rock? if yes -> add score
MATCH p1.name AGAINST('>Bl* >rock* >((+Bl*) (+rock*)) >("Bl rock")' IN BOOLEAN MODE) AS match
Now just order by match and it will show you the most relevant searches. You can also combine the order by with multiple statements and add a limit like below:
Orders by most recent date, highest match and then orders the matches that have the same score by their character length
ORDER BY `date` DESC, `match` DESC, LENGTH(`p1`.`name`) ASC
Keep in mind that the above code somehow creates a relevant result based on common cases. Copying Google will be imposible since their algorithms for optimal results / speed are incredible.
If FULLTEXT search is a step to much, try to make a tag system. Tagging content with unique tag combinations will also result in a more reliable search result

Text matching in sql query (LIKE)

My Database:
ID place
-----------------------------------------------
1 Bhowali
2 Bangalore
3 Tumkur
My Query:
SELECT *
FROM table
WHERE place LIKE '%bhovali'
Question is: many users search wrong keyword on that time the query result should match and output correct one.. in my above query bhovali is a wrong word the correct word is bhowali. is there any way to get correct result from the query??..

may be this can help SELECT * FROM table WHERE SOUNDEX(name) LIKE SOUNDEX('bhovali') ;

This is a really loaded question... short answer is no. SQL is made for giving you what you want, not what you might want.
You would have to develop some kind of intelligence engine to recognize that there are similar names and search for those as it occurs.

in mysql you can use full text search if the column has the full text index as
select * from table match(name) against ('bhovali');

Understanding COUNT() as `count`

I'm currently learning how to build a site in PHP MySQL. However, I seem to fail to understand COUNT() as count and wouldn't mind some further explanation.
I get the principles of COUNT, 0 || 1, and how it returns all the values that pertain to that query.
But, don't see how COUNT as count works. Anyhow, this is how the code I'm writing goes - so we have a working example - and where I first became perplexed.
"SELECT COUNT(id) as count, id
FROM user
WHERE email='$email' AND password='".md5$password."'"

That is what is called alias which is sometimes used to show a more appealing column header to users or the calling code
SELECT COUNT(`id`) as `count`....
will print
count
--------
5
The alias standing as the column header instead of any arbitrary string: See the SQLFiddle to see the difference
From the fiddle you can see that the header column looks somehow e.g.
count(*)
--------
5

With Count() you can count the returning rows of a result set. The also the official MySQL documentation about count:
Databases are often used to answer the question, “How often does a certain type of data occur in a table?” For example, you might want to know how many pets you have, or how many pets each owner has, or you might want to perform various kinds of census operations on your animals.
Counting the total number of animals you have is the same question as “How many rows are in the pet table?” because there is one record per pet. COUNT(*) counts the number of rows, so the query to count your animals looks like this:
SELECT COUNT(*) FROM pet;
The part with AS count means that this colum will get a name which you can use e.g. in PHP. See also this explenation on w3schools:
You can give a table or a column another name by using an alias. This can be a good thing to do if you have very long or complex table names or column names.
An alias name could be anything, but usually it is short.

as count is just an alias. You can use as for any field or method selected. it means you change the name of the column being returned in your dataset.
SELECT `field` as another_name
So:
SELECT COUNT(*) as `count`
Just renames the column from COUNT(*) to count making it easier to work with whereever you are maniuplating your result set.
It also makes for easier access within your current query. Many would do the following with large table names:
SELECT * FROM `table_with_ridiculous_name` as twrn WHERE twrn.id = 1

If you ran this sql:
SELECT COUNT(id), id ....
You would get (after doing a *_fetch_assoc) $row['numberofrecordshere'] which would be very hard to echo (or use in a comparison) unless you knew how many records there would be (which would defeat the purpose of this result, anyway)
Returning it as count allows you to get to it in the resulting array by using $row['count']

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to check data for duplication in database? - php

SQL Server provides Full Text Search feature. You may try for this instead of your like clause.

ATaylor pretty much spells it out for you. To exclude the specific common keywords, I would recommend using an "EXCEPT SELECT WHERE LIKE" statement at the end of your statement.

SELECT DISTINCT(field) AS var1, COUNT(field) AS var2 FROM table GROUP BY field HAVING var2 > 1; This would show up any field that exist twice or more in your current database

Related

Use like operator and order by sub query

Querying a table where the field contains any order of given text strings

php/mysql/ajax: google style search with suggestions

Text matching in sql query (LIKE)

Understanding COUNT() as `count`

Categories

Resources