MYSQL PHP: Find duplicates based on Address Column

MYSQL PHP: Find duplicates based on Address Column - php

I have an addresses table in my MYSQL database with the following structure:
The first column ID, is a primary, auto-increment column.
The second column Name is varchar.
The third column contains address (text), filled by user.
The forth column contains address slug, which is basically the address (Third Column) in lower case and without any special characters.
The last column contains the creation date of the record.
I wish to display all the records and highlight the possible duplicates, based on the address/address slug.
In this case, the duplicates are as follows:
Record 1 and Record 2
Record 3 and Record 6
Is there a way to partially match a string in MYSQL or PHP, to achieve the above results?
FYI: I have gone through SPHINX PHP, SQL FULLTEXT SEARCHES etc.
I have been struggling over 2 weeks, but couldn't find any optimal solution.
Any ideas, suggestions, solutions are welcome.

Since laravel was tagged initially, later removed, I thought the strategy can still help.
This is the given list:
$lists = [
[
'id' => 1,
'text' => '2693 Edgewood Road Exit',
],
[
'id' => 2,
'text' => '4408 Cost 4657 Avenue',
],
[
'id' => 3,
'text' => '2693 Mapleview Road',
],
[
'id' => 4,
'text' => '4657 Cost Edgewood Avenue',
],
[
'id' => 5,
'text' => '4408 Mapleview Drive Road',
]
];
Goal is to find repetitive/duplicate texts from each.
Since finding duplication of ONE word is not a real scenario, I thought of finding the duplication with TWO words with all the combinations possible.
$combinations = [];
foreach ($lists as $list) {
$insideCombo = [];
$insideText = explode(' ', $list['text']);
$length = count($insideText);
for ($i = 0; $i < $length; $i++) {
for ($j = $i + 1; $j < $length; $j++) {
if (isset($insideText[$j])) {
$insideCombo[] = $insideText[$i] . ' ' . $insideText[$j];
}
}
}
$combinations[$list['id']] = $insideCombo;
}
This is gonna return
// for '2693 Edgewood Road Exit'
1 => array:6 [
0 => "2693 Edgewood"
1 => "2693 Road"
2 => "2693 Exit"
3 => "Edgewood Road"
4 => "Edgewood Exit"
5 => "Road Exit"
]
Now, we loop again to compare the possible repetition. Here, we leverage Laravel's Str::containsAll()
$copyCat = [];
foreach ($lists as $list) {
foreach ($combinations as $comboKey => $combination) {
/* no need to compare the text with itself &&
* to avoid duplication of '4 to 2' if '2 to 4' is already mentioned
*/
if ($list['id'] != $comboKey && $list['id'] < $comboKey) {
foreach ($combination as $row) {
if (Str::containsAll($list['text'], explode(' ', $row))) {
$copyCat[] = $list['id'] . ' matches with ' . $comboKey . ' with "' . $row . '"';
}
}
}
}
}
Final Response of $copyCat
array:5 [
0 => "1 matches with 3 with [2693 Road]"
1 => "2 matches with 4 with [4657 Cost]"
2 => "2 matches with 4 with [4657 Avenue]"
3 => "2 matches with 4 with [Cost Avenue]"
4 => "3 matches with 5 with [Mapleview Road]"
]
Keep me posted in the comments below. Cheers!

Make an empty duplicate of the table - e.g. mytable_to_update.
Run a few queries to find out duplicates.
Start with populating the newly created table with non-duplicates. Initial query:
SELECT SUBSTRING_INDEX(Name,' ',1),COUNT(*)
FROM mytable_to_update
GROUP BY SUBSTRING_INDEX(Name,' ',1) HAVING COUNT(*) = 1;
The SUBSTRING_INDEX will capture the first string before space (' '). In the example, Sam Mcarthy will become Sam only. Then using that to group and count how many name occurrences it has. HAVING COUNT(*) = 1 will only show any name occurring once. But that might as well return nothing if there's a name like Joe and Joe John but the two are actually a different person with different addresses (since the first query only group by the first name occurring). Therefore, we need to add address comparison in the mix.
Add the same function to the Address column like this:
SELECT SUBSTRING_INDEX(Name,' ',1),
SUBSTRING_INDEX(Address,' ',1), /*we take the first string in the address*/
COUNT(*)
FROM mytable_to_update
GROUP BY SUBSTRING_INDEX(Name,' ',1),
SUBSTRING_INDEX(Address,' ',1) /*then add group by for the address*/
HAVING COUNT(*) = 1;
Similarly, we take only the first string occurrence from the address. So let's say for example there are two data that looks like this, Joe, 12 Street.. and Joe John, 12 St. .., what will happen is the query above will (given the SUBSTRING_INDEX function) take only the first string occurrence; Joe, 12 , which will return the count value as 2. That means both data (Joe, 12 Street.. and Joe John, 12 St. ..) are considered as duplicates and will not show in the query results.
Change the query to list out all non-duplicates ID to be inserted into mytable_to_update table:
INSERT INTO mytable_to_update
SELECT * FROM mytable WHERE ID IN
(SELECT GROUP_CONCAT(ID) /*replace everything else in the select with just `ID`*/
FROM mytable
GROUP BY SUBSTRING_INDEX(Name,' ',1),
SUBSTRING_INDEX(Address,' ',1)
HAVING COUNT(*) = 1) ;
Note: I'm using GROUP_CONCAT(ID) because of incompatibility of sql_mode=only_full_group_by - if it's being set. Of course the result could be different (like '1,2' or '1,,,,,') but since we're only looking at any count=1, it shouldn't have a problem as it will only return 1 value. I've tested with ANY_VALUE it also return similar results.
Now you have all the non-duplicates inside the mytable_to_update table. the next step is to search for duplicates and insert the ones that you only want. This is merely a suggestion/assumption of what you might want and it's not 100% accurate due to the nature of the data value that we're comparing.
The query is similarly structured and changed only in a few places, for example:
SELECT GROUP_CONCAT(ID), /*add GROUP_CONCAT to list all the duplicates group by the first name & address string.*/
Name,
Address,
COUNT(*)
FROM mytable
GROUP BY SUBSTRING_INDEX(Name,' ',1),
SUBSTRING_INDEX(Address,' ',1)
HAVING COUNT(*) > 1; /*Change '= 1' to '> 1' to get any records with more than 1 count.*/
Using GROUP_CONCAT to generate a comma separated list of ID that has possible duplicates.
Then add GROUP_CONCAT over all the columns listed with identical ORDER BY so every columns will be ordering by the same thing.
SELECT GROUP_CONCAT(ID ORDER BY ID), /*add ORDER BY*/
GROUP_CONCAT(Name ORDER BY ID),
GROUP_CONCAT(Address ORDER BY ID),
COUNT(*)
FROM mytable
GROUP BY SUBSTRING_INDEX(Name,' ',1),
SUBSTRING_INDEX(Address,' ',1)
HAVING COUNT(*) > 1;
With this you go over the values it returned for any of the duplicates and compare it side by side. That way you can decide to omit any ID that you don't want to appear in the list by adding WHERE ID NOT IN(1,3 ...) etc.
Once you've finalized which ID you want to keep, you can do something like this:
INSERT INTO mytable_to_update
SELECT * FROM mytable WHERE ID IN
(SELECT SUBSTRING_INDEX(GROUP_CONCAT(ID ORDER BY ID),',',1)
/*assuming that you only want the first ID in the set, do SUBSTRING_INDEX to separate the first ID*/
FROM mytable
GROUP BY SUBSTRING_INDEX(Name,' ',1),
SUBSTRING_INDEX(Address,' ',1)
HAVING COUNT(*) > 1);
Now you'll have a table (mytable_to_update) that might probably have all non-duplicates. In case some of the data in the mytable_to_update are not what you want, you can just remove it or in case there are some data you think is not a duplicate, you can insert it. It's pretty much a manual process afterwards; well, even with the queries, only yourself can determine whether the processes/data are correct.
Here's a fiddle: https://www.db-fiddle.com/f/6Dfrn78mqZbGTwZs3U9Vhi/0

Related

Finding a value in a php array

I've been banging my head hard over this problem for the last 2-3 days trying to see the problem from as many different angles as possible but to no avail. I'm turning to the SO community for extra perspectives. Below is the code I have which prints all 9 product plans. I'm wanting to find and print the plan with pricing equals or closest to a given user input. How can I do this?
//arrays of productnames
$productnames=array(1=>"Beginner","Advanced","Expert");
//arrays of productlevels
$productlevels=array(1=>"Bronze","Silver","Gold");
//Get The Length of Product Name Array
$planname_array_length=count($productnames);
//Get The Length of Product Level Array
$planlevel_array_length=count($productlevels);
for ($prn=1; $prn <= $planname_array_length; $prn++) {//loop to create plan name indicators
for ($prl=1; $prl <= $planlevel_array_length; $prl++) {//loop to create plan level indicators
$getpoductsql = " SELECT name, level,productNameId,productLevelId,finalProductPrice
FROM (
SELECT wspn.productName AS name, wspl.productLevel AS level, wsp.productNameId AS productNameId, wsp.productPlanLevel AS productLevelId,
ROUND(SUM(`Price`) * 1.12) AS finalProductPrice,
FROM `products` ws
left join product_plan wsp on wsp.productId = ws.wsid
left join product_plan_level wspl on wsp.productPlanLevel = wspl.wsplid
left join product_plan_name wspn on wspn.wspnid = wsp.productNameId
WHERE wspn.productName = '$planname_array_length[$pn]' AND wspl.productLevel = '$planlevel_array_length[$pl]'
)
AS x ORDER BY ABS(finalProductPrice - $compareprice)"
$resultproducts = $conn->query($getpoductsql);
$prodArray = mysqli_fetch_array($resultproducts);
//print array of each plan
$resultArr = array('planNameID' => $prodArray['planNameId'],
'planName' => $prodArray['name'],
'planLevelID' => $prodArray['planLevelId'],
'planLevelName' => $prodArray['level'],
'planPrice' => $prodArray['finalProductPrice'];
//print arrays of products
echo json_encode($resultArr);
}
}
This will output 9 plans as follow :
{"planNameID":"1","productName":"Beginner","productLevelID":"1","productLevelName":"Bronze","productPrice":"15"}

Rather than performing a separate query for each product name and product level, do them all in one query, and let MySQL find the one with the closest price.
$getpoductsql = " SELECT name, level,productNameId,productLevelId,finalProductPrice
FROM (
SELECT wspn.productName AS name, wspl.productLevel AS level, wsp.productNameId AS productNameId, wsp.productPlanLevel AS productLevelId,
ROUND(SUM(`Price`) * 1.12) AS finalProductPrice,
FROM `products` ws
left join product_plan wsp on wsp.productId = ws.wsid
left join product_plan_level wspl on wsp.productPlanLevel = wspl.wsplid
left join product_plan_name wspn on wspn.wspnid = wsp.productNameId
WHERE wspn.productName IN ('Beginner', 'Advanced', 'Expert') AND wspl.productLevel IN ('Bronze', 'Silver', 'Gold')
GROUP BY productNameId, productLevelId
)
AS x ORDER BY ABS(finalProductPrice - $compareprice)"

forgive my formatting, I'm on mobile
Like Amr Berag said above, your result should be the first row returned from your query.
If you have a table like this:
ID value
---- ------
A 7
B 12
C 23
...
You can then SELECT from this table to find the closest to some value, like so:
(Assume your desired value is $VALUE)
SELECT id, value, ABS(value - $VALUE) AS diff
FROM your_table
ORDER BY diff ASC
This will return something like this (say $VALUE is 10):
id value diff
-- ------ ----
B 12 2
A 7 3
C 23 13
...
You can just pick the first row.
You may also be able to add a WHERE clause to only select the row with the least difference using the MIN function:
SELECT id, value, ABS(value - $VALUE) AS diff
FROM your_table
WHERE diff = MIN(diff)

The way you are doing it will produce invalid json, do it like this:
$result=array();
for ($prn=1; $prn <= $planname_array_length; $prn++) {
for ($prl=1; $prl <= $planlevel_array_length; $prl++) {
. . . // the other code
//print array of each plan
$resultArr = array('planNameID' => $prodArray['planNameId'],
'planName' => $prodArray['name'], 'planLevelID' => $prodArray['planLevelId'],
'planLevelName' => $prodArray['level'],
'planPrice' => $prodArray['finalProductPrice'];
//print arrays of products
$resul[]=$resultArr;
}//loop1
}//loop2
echo json_encode($result);
you should also add the limit 1 and do the rest in JS in the front end

MYSQL / PHP Calculating number of occurrences

First off all I am slightly confused what the best implemtentation would be for the following problem i.e pure can it be done with only mysql without altering tables or would I need a combination of PHP and mysql as I am currently doing.
Please keep that in mind as you read on:
Question Info
A Pickem game works as follow:
1- Display all matches / fixtures in a round for a tournament.
2- User enters which teams he thinks will win each fixture.
The fixtures are pulled from a table schedule and the users results are recorded in a table picks
Keep In mind
Each round can have a number of matches (anywhere between 1 to 30+ matches)
What I am trying todo / PROBLEM
I am trying to calculate how many users selected team1 to win and how many users selected team2 to win for a given round in a tournament.
Example
Manchester United: 7 users picked |
Arsenal 3: users picked
MYSQL TABLES
schedule table Schedule of upcoming games
picks table User Picks are recorded in this table
Expected Output From Above Tables After Calculations
So for Super Rugby Round 1 it should read as follow:
gameID 1 4 picks recorded, 2 users selected Jaquares 1 user Selected Stormers (ignore draw fro now)
gameID 2 4 picks recorded, 4 users selected Sharks, 0 users selected Lions
My Code
function calcStats($tournament, $week)
{
global $db;
//GET ALL GAMES IN TOURNAMENT ROUND
$sql = 'SELECT * FROMpicks
WHERE picks.tournament = :tournament AND picks.weekNum = :weekNum ORDER BY gameID';
$stmnt = $db->prepare($sql);
$stmnt->bindValue(':tournament', $tournament);
$stmnt->bindValue(':weekNum', $week);
$stmnt->execute();
if ($stmnt->rowCount() > 0) {
$picks = $stmnt->fetchAll();
return $picks;
}
return false;
}
test.php
$picks = calcStats('Super Rugby', '1');
foreach($picks as $index=> $pick) {
if($pick['gameID'] !== $newGameID){
?>
<h1><?php echo $pick['gameID']?></h1>
<?php
//reset counter on new match
$team1 = 0;
$team2 = 0;
}
if($pick['picked'] === $newPick){
//gameID is passed as arrayKey to map array index to game ID
//team name
$team1[$pick['picked']];
//number times selected
$team1Selections[$pick['gameID']] = $team1++;
}
else if($pick['picked'] !== $newPick){
///gameID is passed as arrayKey to map array index to game ID
//team name
$team2[$pick['picked']];
$team2Selections[$pick['gameID']] = $team2++;
}
$newPick = $pick['picked'];
$newGameID = $pick['gameID'];
}
PRINT_R() Of function $picks = calcStats('Super Rugby', '1')
I hoe my question makes sense, if you need any additional information please comment below, thank you for taking the time to read.

It seems that you're doing too much within PHP that can be easily done within MySQL; consider the following query:
SELECT gameID, team, COUNT(*) AS number_of_picks
FROM picks
WHERE picks.tournament = :tournament AND picks.weekNum = :weekNum
GROUP BY gameID, team
ORDER BY gameID, team
This will give the following results, given your example:
1 | Jaquares | 2
1 | Stormers | 1
1 | Draw | 1
2 | Sharks | 4
Then, within PHP, you perform grouping on the game:
$result = array();
foreach ($stmnt->fetchAll() as $row) {
$result[$row['gameID']][] = $row;
}
return $result;
Your array will then contain something like:
[
'1' => [
[
'gameID' => 1,
'team' => 'Jaquares',
'number_of_picks' => 2,
],
'gameID' => 1,
'team' => 'Stormers',
'number_of_picks' => 1,
],
...

Unique value count of comma separated field (PHP - MySQL)

I have mysql table that looks like this:
id place interest
1 place1 a,b,c
2 place2 c,d,e
3 place1 a,e
4 place2 f
5 place2 f
6 place3 g,h
I need to get unique "place" and "interest" values sorted as per the count.
So, the output for "place" would be
place2(3)
place1(2)
place3(1)
So, the output for "interest" would be
a(2)
c(2)
e(2)
f(2)
b(1)
d(1)
g(1)
h(1)
is there a way to do this in PHP-Mysql?
So, far I have been able to get simple column data
SELECT place,
COUNT( * ) AS num
FROM testtab
GROUP BY place
ORDER BY COUNT( * ) DESC

As mysql is not able to hold arrays, its better to build a new table like this:
interest_id interest_name
1 a
2 b
and another one to keep the relations:
pk id interest_id
1 1 1
2 1 2
which this id is the id of the records in your main table.
With having this, you can easily use:
select count(*) from THIRD_TABLE where id = YOUR_ID

You can do this.
$place = array();
$interests = array();
foreach($rows as $row){
if (!isset($place[$row["place"]])){
$place[$row["place"]] = 0;
}
$place[$row["place"]]++;
$ints = explode(",", $row["interests"]);
foreach($ints as $int){
if (!isset($interests[$int])){
$interests[$int] = 0;
}
$interests[$int]++;
}
}
This will give you the two arrays keyed off of the relevant field with the value being the count. If this is going to be a common action in your application it would make more sense to normalize your data as suggested by AliBZ.

This is for the first result you need
SELECT place,COUNT(interest)
FROM `testtab`
GROUP by place
ORDER BY COUNT(interest) desc

can do this :
$inst_row = '';
foreach($rows as $row){
$inst_row .= $row['interests'];
}
$inst_values = explode(',', $inst_row);
$inst_count = array_count_values($inst_values);
// $inst_count will return you count as you want ,print_r it and format it accordingly

SQL query results returned, even if exact match not found

I hope this question isn't redundant. What I am trying to accomplish is have a user select a bunch of checkboxes on a page and return the closest matching records if there are no matching rows. For example:
A person checks off [x]Apples [x]Oranges [x]Pears [x]Bananas
But the table looks like this:
Apples Oranges Pears Bananas
1 1 1 null
1 1 null 1
1 1 null null
(Obviously I missed the id column here, but you get the point I think.) So, the desired result is to have those three rows still be returned in order of most matches, so pretty much the order they are in now. I'm just not sure what the best approach to take on something like this. I've considered a full text search, the levenshtein function, but I really like the idea of returning the exact match if it exists. No need for you to go at length with code if not needed. I'm just hoping to be sent in the right direction. I HAVE seen other questions sort of like this, but I still am unsure about which way to go.
Thanks!

Write a query that adds up the number of columns that matched, and sorts the rows by this total. E.g.
SELECT *
FROM mytable
ORDER BY COALESCE(Apples, 0) = $apples + COALESCE(Oranges, 0) = $oranges + ... DESC

It's easy to sort by a score...
SELECT fb.ID, fb.Apples, fb.Oranges, fb.Pears, fb.Bananas
FROM FruitBasket fb
ORDER BY
CASE WHEN #Apples = fb.Apples THEN 1 ELSE 0 END
+ CASE WHEN #Oranges = fb.Oranges THEN 1 ELSE 0 END
+ CASE WHEN #Pears = fb.Pears THEN 1 ELSE 0 END
+ CASE WHEN #Bananas = fb.Bananas THEN 1 ELSE 0 END
DESC, ID
However, this leads to a table-scan (even with TOP). The last record may be a better match than the records found so far, so every record must be read.
You could consider a tagging system, like this
Content --< ContentTag >-- Tag
Which would be queried this way:
SELECT ContentID
FROM ContentTag
WHERE TagID in (334, 338, 342)
GROUP BY ContentID
ORDER BY COUNT(DISTINCT TagID) desc
An index on ContentTag.TagId would be used by this query.

This is fairly simple, but you can just use IFNULL() (MySQL, or your DB's equivalent) to return a sum of matches and use that in your ORDER BY
// columns and weighting score
$types = array("oranges"=>1, "apples"=>1, "bananas"=>1, "pears"=>1);
$where = array();
// loop through the columns
foreach ($types as $key=>&$weight){
// if there is a match in $_REQUEST at it to $where and increase the weight
if (isset($_REQUEST[$key])){
$where[] = $key . " = 1";
$weight = 2;
}
}
// build the WHERE clause
$where_str = (count($where)>0)? "WHERE " . implode(" OR ", $where) : "";
// build the SQL - non-null matches from the WHERE will be weighted higher
$sql = "SELECT apples, oranges, pears, bananas, ";
foreach ($types as $key=>$weight){
$sql .= "IFNULL({$key}, 0, {$weight}) + ";
}
$sql .= "0 AS score FROM `table` {$where_str} ORDER BY score DESC";
Assuming that "oranges" and "apples" are selection, your SQL will be:
SELECT apples, oranges, pears, bananas,
IFNULL(apples, 0, 2) + IFNULL(oranges, 0, 2) + IFNULL(pears, 0, 1) + IFNULL(bananas, 0, 1) + 0 AS score
FROM `table`
WHERE oranges = 1 OR apples = 1
ORDER BY score DESC

Order descending by the sum of checkbox/data matches
SELECT * FROM table
ORDER BY (COALESE(Apple,0) * #apple) + (COALESE(Orange,0) * #orange) ..... DESC
where #apple / #orange represents users selection: 1 = checked, 0 = unchecked

PHP contest Logical

In one of my web application in php there is a contest section . It contains a multiple choice 10 questions , Each has 4 options .
After user filling the form I am saving the answer as comma separated values in a db . like follows:
user | answer
-------------------------------------
112 | 1,7,8,9,8,5,2,3,6,7,9,6
I got a answer key same as the use's filled answer key ..
What is the best logical method for evaluate the users input and find out the highest scored user?

As mentioned in the comments, this isn't the best way to store data, but I'd evaluate like this:
$query = mysql_query("select * from `table` where 1",CONNECTION_IDENTIFIER) or die("die message");
$answer_key = array(answer1,answer2,etc);
$high_score = 0;
$high_scorer= "";
while($r=mysql_fetch_array($query)){
$users_answers = explode(',',$r['answer']);
$user_score = 0;
for($i=0;$i<10;$i++){
if ($answer_key[$i]==$users_answers[$i]){
$user_score++;
}
}
if ($user_score > $high_score){
$high_score = $user_score;
$high_scorer = $r['user'];
}
}
echo "High scorer is $high_scorer with $high_score points";

if you have answers with scores like that:
$answersRating = array(1 => 0, 2=> 1, 3 => 3, 4 => 2, ....) when selecting answer 1 he got 0 points, for 2 => one point, for 3 => 3 points and so on. You can do something like that:
$score = array_sum(array_intersect_key($answersRating, array_flip(explode(',', $userAnswersStringFromDB))));

I think you should structure your DB like this:
NOTE: This is bare minimum, you of course would add extra fields to questions like name, description, etc
answers | id, user_id, question_id, answer
questions | id, contest_id, correct_answer
user | id, name
Then you could get everything with a query.
Top Score:
SELECT u.name,count(*) as Score FROM user u, answers a, questions q WHERE u.id=a.user_id and q.id = a.question_id and q.correct_answer=a.answer WHERE q.contest_id=XXX ORDER BY Score

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

MYSQL PHP: Find duplicates based on Address Column - php

Related

Finding a value in a php array

MYSQL / PHP Calculating number of occurrences

Unique value count of comma separated field (PHP - MySQL)

SQL query results returned, even if exact match not found

PHP contest Logical

Categories

Resources