MySQL Query Where Column Like Column - php

I'm working on a small project that involves grabbing a list of contacts which are stored for each group. Essentially, the database is set up so that each group has a primary and secondary contact stored as, unsurprisingly, Group.Primary and Group.Secondary. The objective is to pull every Primary and Secondary contact for each Group and display them in a sortable table.
I have the sortable table all worked out, but I have come across a small problem. Each primary and secondary field can have more than one contact separated by a comma. For instance, if Primary contained 123,256 , it would need to pull both Contacts with IDs 123 and 256. I had intended to use a query formatted like this:
SELECT *
FROM Group G,
Contacts C
WHERE G.Primary LIKE %C.ID%
OR G.Secondary LIKE %C.ID%
so that I could just skip the comma part, but I can't seem to find a working query for this.
My question to you is, am I just overlooking something here? Is there a simple query that would let me do this? Or am I better off getting the groups and contacts separately, and combine the two later. I think the former is a little easier to understand when read, which is a plus as this is a shared project, but if that is not possible I will do the latter.
This code is simplified, but it gets the point across.

If I understand correctly, you want to use the MySQL FIND_IN_SET function:
SELECT *
FROM Group G
JOIN Contacts C ON FIND_IN_SET(c.id, g.primary)
OR FIND_IN_SET(c.id, g.secondary)
But I highly recommend you normalize the table -- do not store comma delimited lists if at all possible.

I think you're definitely better off separating those two data values into different tables and then using JOINs to do your linking. If you were to, say, cast your id fields to strings so you could use the LIKE comparison, you'd end up with a bunch of junk matches. For example, if your primary id is 1, and your secondary is 35, then you'd match on the following (and this list is not exhaustive):
1: 1, 2: 35
1: 35, 2: 1
1: 10, 2: 135
1: 431, 2: 3541
etc.
What I'd do instead is something like this:
SELECT *
FROM Group G
LEFT JOIN Contacts c1 on g.primary = c1.id
LEFT JOIN Contacts c2 on g.secondary = c2.id
WHERE
c1.id IS NOT NULL
OR
c2.id IS NOT NULL
I think that'll get you the data you're really looking for, if I understand the question correctly.

Satan has been in your database, denormalizing it and dooming you to a life of complex and slow queries.
Do you have the ability to alter the structure of the database? If it's in production, I assume not. Failing that, you might want to consider creating a normalized table of primary and secondary contacts immediately prior to running this report.
If you can't do that, you need to work out a string matching algorithm that will always work. The problem with the one that you proposed is that you need to consider a contact id of 23 (or even 3), which will match 23, 123, 223, 231, and so on. To make that work, you need to add commas to the beginning and ending of both strings you're comparing and then do the LIKE.
Oops. Or you can use the I-never-knew FIND_IN_SET function described by Ponies, above.

Related

SQL Join duplicates, converting an access db

I am converting an access database to a new format. Currently all data resides in MySQL.
For the purposes of this question, there are 3 tables. tbl_Bills, tbl_Documents, and tbl_Receipts.
I wrote an outer join query , as some bills have documents and receipts, other's don't. And I need a full listing of each set, given those situations, to be processed by a php script later on.
The problem is that the primary identifier, we'll call fld_CommonID, happens to exist in duplicate. For example, 3 bills have the same identifier, with different information. 3 documents and 3 receipts match those 3 bills.
So as you might have guessed, my join query results in 9 indistinct rows (6 duplicates), when there should be 3 (one join from each table). An inner join excludes data that isn't defined in the other table, and so doesn't work for my needs.
SO ... I'm thinking what I want to do, is update those 3 records in each table (across all rows that have duplicates) such that they have a unique counter id. #1, #2, and #3 respectively, so that I can perform join queries on them uniquely per row.
Is that possible without running php code to select the duplicates ordered by natural table order, followed-by updating them with a counter?
Would you advise that I go that route(scripted) instead of some magical SQL query to do such a thing, if such a query can be made?
Or is it possible to outer join based on natural table order (pretty sure that's impossible)?
writing this answer to simply close the question.
Inner joins would be perfect if there were a way to link duplicate fields in separate tables based on natural order (no primary key). The problem isn't that I lack a query, it's that the database is poorly structured. Which is a problem better solved with code not complex queries.

How can I separate row values into different columns in PHP and MySQL

I'm trying to create a query that takes values from different rows and put them into different columns. Here is the query I've come up with so far:
SELECT red.match_number AS 'Match #', red.redTeams AS 'Red Alliance', blue.blueTeams AS 'Blue Alliance', red.redScore AS 'Red Alliance Score', blue.blueScore AS 'Blue Alliance Score'
FROM(SELECT match_number, GROUP_CONCAT(team SEPARATOR ' | ') AS redTeams, score AS redScore
FROM `scout_data`
WHERE alliance_color = 'red'
GROUP BY match_number) AS red
LEFT JOIN (SELECT match_number, GROUP_CONCAT(team SEPARATOR ' | ') AS blueTeams, score AS blueScore
FROM `scout_data`
WHERE alliance_color = 'blue'
GROUP BY match_number) AS blue ON red.match_number = blue.match_number
Which creates a table like this:But I want to separate the numbers in the blue and red alliance columns so it looks something like this:Except without the qualifications column name. The structure of my table looks like this: I've limited all the columns in the picture to just what's relevant to the query.
I think what you are wanting or referring to is called a cross join, or a pivot table
As I said in the comments,
Already Answered
MySql CROSS JOIN between two tables and match with another
Example from that answer http://sqlfiddle.com/#!9/3540f/4
Warning about MySql's GROUP_CONCAT():
Another thing I should mention ( I feel ) as I see you using GROUP_CONCAT, when I first saw this I thought it was a magical answer for some things I needed. Latter I discovered that it's MySql specific. But more worrisome, and after some hard fought debugging, I discovered that there is a length limit imposed on it. So in short it's possible GROUP_CONCAT can truncate your data. I've decided to avoid using it in my code. Just thought I would warn you of that.
Mysql truncates concatenated result of a GROUP_CONCAT Function
Need more information
Because, all that aside this is more of a presentation question, if you have the setup in the first image.
http://puu.sh/jd3a7/658df8a999.png
Already, it would be more appropriate to give us your presentation code ( php\html ) and the data that goes into that then the Sql, the Sql is nice but without the database setup. It's really hard to see what that outputs.
Basically what is the structure of your result array, and how are you getting it to look like image1?
As ArtisiticPhoenix said, I think more information on how the database is configured would be needed in order to provide the most accurate answer. From what I can gather, it looks like all this data is in one single table. The part that I'm confused about is where the score comes from if you're grouping and concatenating by team IDs, is the same score stored for every team? If not then wouldn't that score that's being returned possibly be incorrect?
I guess what I'm getting at is that there appears to be some data normalization issues here, getting some more information would help to debug that as well.

Too relation or not to relation ? A MySQL, PHP database workflow

im kinda new with mysql and i'm trying to create a kind complex database and need some help.
My db structure
Tables(columns)
1.patients (Id,name,dob,etc....)
2.visits (Id,doctor,clinic,Patient_id,etc....)
3.prescription (Id,visit_id,drug_name,dose,tdi,etc....)
4.payments (id,doctor_id,clinic_id,patient_id,amount,etc...) etc..
I have about 9 tables, all of them the primary key is 'id' and its set to autoinc.
i dont use relations in my db (cuz i dont know if it would be better or not ! and i never got really deep into mysql , so i just use php to run query's to Fitch info from one table and use that to run another query to get more info/store etc..)
for example:
if i want to view all drugs i gave to one of my patients, for example his id is :100
1-click patient name (name link generated from (tbl:patients,column:id))
2-search tbl visits WHERE patient_id=='100' ; ---> that return all his visits ($x array)
3-loop prescription tbl searching for drugs with matching visit_id with $x (loop array).
4- return all rows found.
as my database expanding more and more (1k+ record in visit table) so 1 patient can have more than 40 visit that's 40 loop into prescription table to get all his previous prescription.
so i came up with small teak where i edited my db so that patient_id and visit_id is a column in nearly all tables so i can skip step 2 and 3 into one step (
search prescription tbl WHERE patient_id=100), but that left me with so many duplicates in my db,and i feel its kinda stupid way to do it !!
should i start considering using relational database ?
if so can some one explain a bit how this will ease my life ?
can i do this redesign but altering current tables or i must recreate all tables ?
thank you very much
Yes, you should exploit MySQL's relational database capabilities. They will make your life much easier as this project scales up.
Actually you're already using them well. You've discovered that patients can have zero or more visits, for example. What you need to do now is learn to use JOIN queries to MySQL.
Once you know how to use JOIN, you may want to declare some foreign keys and other database constraints. But your system will work OK without them.
You have already decided to denormalize your database by including both patient_id and visit_id in nearly all tables. Denormalization is the adding of data that's formally redundant to various tables. It's usually done for performance reasons. This may or may not be a wise decision as your system scales up. But I think you can trust your instinct about the need for the denormalization you have chosen. Read up on "database normalization" to get some background.
One little bit of advice: Don't use columns named simply "id". Name columns the same in every table. For example, use patients.patient_id, visits.patient_id, and so forth. This is because there are a bunch of automated software engineering tools that help you understand the relationships in your database. If your ID columns are named consistently these tools work better.
So, here's an example about how to do the steps numbered 2 and 3 in your question with a single JOIN query.
SELECT p.patient_id p.name, v.visit_id, rx.drug_name, rx.drug_dose
FROM patients AS p
LEFT JOIN visits AS v ON p.patient_id = v.patient_id
LEFT JOIN prescription AS rx ON v.visit_id = rx.visit_id
WHERE p.patient_id = '100'
ORDER BY p.patient_id, v.visit_id, rx.prescription_id
Like all SQL queries, this returns a virtual table of rows and columns. In this case each row of your virtual table has patient, visit, and drug data. I used LEFT JOIN in this example. That means that a patient with no visits will have a row with NULL data in it. If you specify JOIN MySQL will omit those patients from the virtual table.

Storing database info as array

Which is good practice? To store data as a comma separated list in the database or have multiple rows?
I have a table for accounts, classes, and enrolments.
If the enrolment table has 3 fields: ID, AccountID and ClassID, is it better for ClassID to be a varchar containing a comma separated list such as this: "24,21,182,12" or for it to be just an int and have one entry per enrolment?
tldr: Don't do this. That is, don't use a "packed array" here.
Use a correctly normalized design with "multiple rows". This is likely a good candidate for a Many-to-Many relationship. Consider this structure:
Classes 1:M Enrollments(Class,Student) M:1 Students
Following a properly normalized design will reduce pain. In addition, here are some other advantages:
Referential integrity (use InnoDB)
Consistent model described with relationships
Type enforcement (can't have "foo,,")
JOIN and query without needing custom code
"What are the names of the students in class A?"
"Who is taking more than one class?"
Columns can be useful indexed (query performance)
Generally faster than handling locally in code
More flexible and consistent
Can attach attributes to enrollments such as status
No need to have code to handle serialization at access sites
More accommodating of placeholders and ORMs
Never ever ever cram multiple values into a single database field by combining them with some sort of delimiter, like a comma, or fixed length substrings. In the rare cases where this clearly gives a benefit in storage requirements or performance ... see rule #1: never ever ever. Ever.
When you cram multiple values into a single field, you sabatague all the clever features built into the database engine to help you retrieve and manipulate values.
Like let's say you have this -- I guess it's some sort of student database.
Plan A
student (student_id, account_id, class_id_mash)
Plan B
student (student_id, account_id)
student_class (student_id, class_id)
Okay, lets' say you want a list of all the students taking class #27. With Plan B you write
select student_id
from student join student_class on student.student_id=student_class.student_id
where class_id=27
Easy.
How would you do it with Plan A? You might think
select student_id
from student
where class_id_mash like '%27%'
But that will not only find all students in class 27, but also all those in class 127 or 272.
Okay, how about:
select student_id
from student
where class_id_mash like '%,27,%'
There, now we won't find 127 or 272! But, oops, we also won't find it if the 27 happens to be the first or last one in the list, because then there aren't commas on both sides.
So okay, maybe we could get around that with more rules about delimiters or with a more complex matching expression. But it would be unnecessariliy complex and painful.
And even if we did it, every search for class id has to be a full-fill sequential search. With one value per field and multiple records, you can create an index on the class_id field for fast, efficient retrieval. (Some database engines have ways to index into the middle of text fields, but again, why get into complicated solutions when there's an easy solution?)
How do we validate the class_id's? With separate fields, we can say "class_id references class" and the database engine will insure that we don't enter an illegal value. With the mash, no such free validation.
I have done both, but instead of storing the information in the database as comma seperated, I use another delimiter, such as | (so that I don't worry about formatting on insert into db). Its more about how often you will query the data
If you are only going to need the complete list, it is fine to store it as a comma separated value. But if you need to query the list, they should be stored separately.

How to find similarity between mySQL rows?

I am trying to create a script that finds a matching percentage between my table rows. For example my mySQL database in the table products contains the field name (indexed, FULLTEXT) with values like
LG 50PK350 PLASMA TV 50" Plasma TV Full HD 600Hz
LG TV 50PK350 PLASMA 50"
LG S24AW 24000 BTU
Aircondition LG S24AW 24000 BTU Inverter
As you may see all of them have some same keyword. But the 1st name and 2nd name are more similar. Additionally, 3rd and 4th have more similar keywords between them than 1st and 2nd.
My mySQL DB has thousands of product names. What I want is to find those names that have more than a percentage (let's say 60%) of similarity.
For example, as I said, 1st, 2nd (and any other name) that match between them with more than 60%, will be echoed in a group-style-format to let me know that those products are similar. 3rd and 4th and any other with more than 60% matching will be echoed after in another group, telling me that those products match.
If it is possible, it would be great to echo the keywords that satisfy all the grouped matching names. For example LG S24AW 24000 BTU is the keyword that is contained in 3rd and 4th name.
At the end I will create a list of all those keywords.
What I have now is the following query (as Jitamaro suggested)
Select t1.name, t2.name From products t1, products t2
that creates a new name field next to all other names. Excuse me that I don't know how to explain it right but this is what it does: (The real values are product names like above)
Before the query
-name-
A
B
C
D
E
After the query
-name- -name-
A A
B A
C A
D A
E A
A B
B B
C B
D B
E B
.
.
.
Is there a way either with mySQL or PHP that will find me the matching names and extract the keywords as I described above? Please share code examples.
Thank you community.
Query the DB with LIKE OR REGEXP:
SELECT * FROM product WHERE product_name LIKE '%LG%';
SELECT * FROM product WHERE product_name REGEXP "LG";
Loop the results and use similar_text():
$a = "LG 50PK350 PLASMA TV 50\" Plasma TV Full HD 600Hz"; // DB value
$b = "LG TV 50PK350 PLASMA 50\"" ; // USER QUERY
$i = similar_text($a, $b, $p);
echo("Matched: $i Percentage: $p%");
//outputs: Matched: 21 Percentage: 58.3333333333%
Your second example matches 62.0689655172%:
$a = "LG S24AW 24000 BTU"; // DB value
$b = "Aircondition LG S24AW 24000 BTU Inverter" ; // USER QUERY
$i = similar_text($a, $b, $p);
echo("Matched: $i Percentage: $p%");
You can define a percentage higher than, lets say, 40%, to match products.
Please note that similar_text() is case SensItivE so you should lower case the string.
As for your second question, the levenshtein() function (in MySQL) would be a good candidate.
When I look at your examples, I consider how I would try to find similar products based on the title. From your two examples, I can see one thing in each line that stands out above anything else: the model numbers. 50PK350 probably doesn't show up anywhere other than as related to this one model.
Now, MySQL itself isn't designed to deal with questions like this, but some bolt-on tools above it are. Part of the problem is that querying across all those fields in all positions is expensive. You really want to split it up a certain way and index that. The similarity class of Lucene will grant a high score to words that rarely appear across all data, but do appear as a high percentage of your data. See High level explanation of Similarity Class for Lucene?
You should also look at Comparison of full text search engine - Lucene, Sphinx, Postgresql, MySQL?
Scoring each word against the Lucene similarity class ought to be faster and more reliable. The sum of your scores should give you the most related products. For the TV, I'd expect to see exact matches first, then some others of the same size, then brand, then TVs in general, etc.
Whatever you do, realize that unless you alter the data structures by using another tool on top of the SQL system to create better data structures, your queries will be too slow and expensive. I think Lucene is probably the way to go. Sphinx or other options not mentioned may also be up for consideration.
This is trickier than it seems and there is information missing in your post:
How are people going to use this auto-complete function?
Is it relevant that you can find all names for a product? Because apparently not all stores name their products similarly so a clerk might not be able to find the product (s)he found.
Do you have information about which product names are for the same product?
Is it relevant from which store you're searching? where is this auto-complete used?
Should the auto-complete really only suggest products that match all the words you typed? (it's not so hard, technically, to correct typos)
I think you need a more clear picture of what you (or better yet: the users) want this auto-complete function to do.
An auto-complete function is very much a user-friendly type feature. It aids the user, possibly in a fuzzy way so there is no single right answer. You have to figure out what works best, not what is easiest to do technically.
First figure out what you want, then worry about technology.
One possible solution is to use Damerau-Levenstein distance. It could be used like this
select *
from products p
where DamerauLevenstein(p.name, '*user input here*')<=*X*
You'll have to figure out X that suites your needs best. It should be integer greater than zero. You could have it hard-coded, parameterized or calculated as needed.
The trickiest thing here is DamerauLevenstein. It has to be stored procedure, that implements Damerau-Levenstein algorithm. I don't have MySQL here, so I might write it for you later this day.
Update: MySQL does not support arrays in stored procedures, so there is no way to implement Damerau-Levenstein in MySQL, except using temporary table for each function call. And that will result in terrible performance. So you have two options: loop through the results in PHP with levenstein like Alix Axel suggests, or migrate your database to PostgreSQL, where arrays are supported.
There is also an option to create User-Defined function, but this requires writing this function in C, linking it to MySQL and possibly rebuilding MySQL, so this way you'll just add more headache.
Your approach seems sound. For matching similar products, I would suggest a trigram search. There's a pretty decent explanation of how this works along with the String::Trigram Perl module.
I would suggest using trigram search to get a list of matches, perhaps coupled with some manual review depending on how much data you have to deal with and how frequent you need to add new products. I've found this approach to work quite well in practice.
Maybe you want to find the longest common substring from the 2 strings? Then you need to compute a suffix tree for each of your strings see here http://en.wikipedia.org/wiki/Longest_common_substring_problem.
If you want to check all names against each other you need a cross join in mysql. There are many ways to achieve this:
1. Select a, b From t1, t2
2. Select a, b From t1 Join t2
3. Select a, b From t1 Cross Join t2
Then you can loop through the result. This is the same when I say create a 2d array with n^2-(n-1) elements and each element is connected with each other.
P.S.: Select t1.name, t2.name From products t1, products t2
It sounds like you've gone through all this trouble to explain a complex scenario, then said that you want to ignore the optimal answers and just get us to give you the "handshake" protocol (everything is compared to everything that hasn't been compared to it yet). So... pseudocode:
select * from table order by id
while (result) {
select * from table where id > result_id
}
That will do it.
If your database simply had a UPC code as one of it's fields, and this field was well-maintained, i.e., you could trust that it was entered correctly by the database maintainer and correctly reflected what the item was -- then you wouldn't need to do all of the work you suggest.
An even better idea might be to have a UPC field in your next database -- and constrain it as unique.
Database users attempt to put an-already-existing UPC into the database -- they get an error.
Database maintains its integrity.
And if such a database maintained its integrity -- the necessity of doing what you suggest never arises.
This probably doesn't help much with your current task (apologies) -- but for a future similar database -- you might wish to think about it...
I`d advise you to use some fulltext search engine, like sphinx. It has possibilities to implement any algorithm you want. For example, you may use "quorom" or "any" searches.
It seems that you might always want to return the shortest string?? That's more or a question than anything. But then you might have something like...
SELECT * FROM products LIMIT 1
WHERE product_name like '%LG%'
ORDER BY LENGTH(product_name) ASC
This is a clustering problem, which can be resolved by a data mining method. ( http://en.wikipedia.org/wiki/Cluster_analysis) It requires a lot of memory and computation intensive operations which is not suitable for database engine. Otherwise, separate data mining, text mining, or business analytics software wouldn't have existed.
This question is similar :) to this one:
What is the best way to implement a substring search in SQL?
Trigram can easily find similar rows, and in that question i posted a php+mysql+trigram solution.
You can use LIKE to find similar product names within the table. For example:
SELECT * FROM product WHERE product_name LIKE 'LG%';
Here is another idea (but I'm voting for levenshtein()):
Create a temporary table of all words used in names and their frequencies.
Choose range of results (most popular words are probably words like LCD or LED, most unique words could be good, they might be product actual names).
Suggest for each of result words either:
results with those words
results containing longest substring (like this: http://forums.mysql.com/read.php?10,277997,278020#msg-278020 ) of those words.
Ok, I think I was trying to implement very much similar thing. It can work the same as the google chrome address box. When you type the address it gives you the suggestions. This is what you are trying to achieve as far I am concerned.
I cannot give you exact solution to that but some advice.
You need to implement the dropdown box where someone starts to enter the product they are looking for
Then you need to get the current value of the dropdown and then run query like guy posted above. Can be "SELECT * FROM product WHERE product_name LIKE 'LG%';"
Save results of the query
Refresh the page
Add the results of the query to the dropdown
Note:
You need to save the query results somewhere like the text file with the HTML code i.e. "option" LG TS 600"/option" (add <> brackets to option of course). This values will be used for populating your option box after the page refresh. You need to set up the users session for the user to get the same results for the same user, otherwise if more users would use the search at the same time it could clash. So, with the search id and session id you can match them then. You can save it in the file or the table. Table would be more convenient. It is actually in my sense the whole subsystem for that what are you looking for.
I hope it helps.

Categories