Comparing Data in mySQL

Comparing Data in mySQL - php

I have searched but cannot find an answer that suits my needs specifically. I have two sets of data and need to compare a piece of the first (table 1 Description field) with a list (table two) and return the VIP codes for each interface/order.
The only identifier that is the same for any of the descriptions is the 9 digit order ID that ends in '003'. I need to compare this 9 digit string to the other table that will always start with the order ID but may contain other characters or numbers afterwards. I know a LIKE comparison will work for the second table but I cannot figure out how to strip the order numbers out of the description field.
UPDATE: Table 1 is a temporary table comparing the output of a router interface command. Table 2 is my static account database that has tens of thousands of entries that I do not want to compare to table one. This is why I do not just take table two and compare the order numbers to table one. I am specifically asking for help with a way to extract the 9 digit order ID from the description field of Table 1.
Table 1
Interface Description
Ge 1/0/1 blah_bla_123456003_blahlahlah
Ge 1/0/2 blah_blah_bla_234567003_blahahblh
Ge 1/0/3 b_bla_345678003_blhahblah
Ge 1/0/4 bh_blh_ba_456789003_lahlahbl
Table 2
Order ID VIP Code
123456003.0 Premier
234567003 Wholesale
345678003.6 Normal
456789003.23 Premier
Expected Results
Order* VIP Code
123456003 Premier
234567003 Wholesale
345678003 Normal
456789003 Premier
*(stripped from Description)

If you want to take the first 9 digits of ID from Table 2, you could use left(table2.id,9). It will return the 9 first (left) characters from that field.
Then you can use that with a LIKE (using the "%" wildcard) or using regular expressions.

Why not store the data you need in a dedicated column in each table? Make it a number column and index it, then you can do a very efficient JOIN using that column.
SELECT * FROM Table_1 LEFT JOIN Table_2 USING(common_index)

you can you something like this :
select TRUNCATE(tbl2.orderId,0) orderNum, tbl2.vipcode, tbl1.interface
from table2 tbl2 , table1 tbl1
WHERE tbl1.description like CONCAT('%',TRUNCATE(tbl2.orderId,0),'%');
i made a fiddle here

Related

Most efficient method determining if a list of values completely satisfy a one to many relationship (MySQL)

I have a one-to-many relationship of rooms and their occupants:
Room | User
1 | 1
1 | 2
1 | 4
2 | 1
2 | 2
2 | 3
2 | 5
3 | 1
3 | 3
Given a list of users, e.g. 1, 3, what is the most efficient way to determining which room is completely/perfectly filled by them? So in this case, it should return room 3 because, although they are both in room 2, room 2 has other occupants as well, which is not a "perfect" fit.
I can think of several solutions to this, but am not sure about the efficiency. For example, I can do a group concatenate on the user (ordered ascending) grouping by room, which will give me comma separated strings such as "1,2,4", "1,2,3,5" and "1,3". I can then order my input list ascending and look for a perfect match to "1,3".
Or I can do a count of the total number of users in a room AND containing both users 1 and 3. I will then select the room which has the count of users equal to two.
Note I want to most efficient way, or at least a way that scales up to millions of users and rooms. Each room will have around 25 users. Another thing I want to consider is how to pass this list to the database. Should I construct a query by concatenating AND userid = 1 AND userid = 3 AND userid = 5 and so on? Or is there a way to pass the values as an array into a stored procedure?
Any help would be appreciated.

For example, I can do a group concatenate on the user (ordered ascending) grouping by room, which will give me comma separated strings such as "1,2,4", "1,2,3,5" and "1,3". I can then order my input list ascending and look for a perfect match to "1,3".
First, a word of advice, to improve your level of function as a developer. Stop thinking of the data, and of the solution, in terms of CSVs. It limits you to thinking in spreadsheet terms, and prevents you from thinking in Relational Data terms. You do not need to construct strings, and then match strings, when the data is in the database, you can match it there.
Solution
Now then, in Relational data terms, what exactly do you want ? You want the rooms where the count of users that match your argument user list is highest. Is that correct ? If so, the code is simple.
You haven't given the tables. I will assume room, user, room_user, with deadly ids on the first two, and a composite key on the third. I can give you the SQL solution, you will have to work out how to do it in the non-SQL.
Another thing I want to consider is how to pass this list to the database. Should I construct a query by concatenating AND userid = 1 AND userid = 3 AND userid = 5 and so on? Or is there a way to pass the values as an array into a stored procedure?
To pass the list to the stored proc, because it needs a single calling parm, the length of which is variable, you have to create a CSV list of users. Let's call that parm #user_list. (Note, that is not contemplating the data, that is passing a list to a proc in a single parm, because you can't pass an unknown number of identified users to a proc otherwise.)
Since you constructed the #user_list on the client, you may as well compute #user_count (the number of members in the list) while you are at it, on the client, and pass that to the proc.
Something like:
CREATE PROC room_user_match_sp (
#user_list CHAR(255),
#user_count INT
...
)
AS
-- validate parms, etc
...
SELECT room_id,
match_count,
match_count / #user_count * 100 AS match_pct
FROM (
SELECT room_id,
COUNT(user_id) AS match_count -- no of users matched
FROM room_user
WHERE user_id IN ( #user_list )
GROUP BY room_id -- get one row per room
) AS match_room -- has any matched users
WHERE match_count = MAX( match_count ) -- remove this while testing
It is not clear, if you want full matches only. In that case, use:
WHERE match_count = #user_count
Expectation
You have asked for a proc-based solution, so I have given that. Yes, it is the fastest. But keep in mind that for this kind of requirement and solution, you could construct the SQL string on the client, and execute it on the "server" in the usual manner, without using a proc. The proc is faster here only because the code is compiled and that step is removed, as opposed to that step being performed every time the client calls the "server" with the SQL string.
The point I am making here is, with the data in a reasonably Relational form, you can obtain the result you are seeking using a single SELECT statement, you don't have to mess around with work tables or temp tables or intermediate steps, which requires a proc. Here, the proc is not required, you are implementing a proc for performance reasons.
I make this point because it is clear from your question that your expectation of the solution is "gee, I can't get the result directly, I have work with the data first, I am ready and willing to do that". Such intermediate work steps are required only when the data is not Relational.

Maybe not the most efficient SQL, but something like:
SELECT x.room_id,
SUM(x.occupants) AS occupants,
SUM(x.selectees) AS selectees,
SUM(x.selectees) / SUM(x.occupants) as percentage
FROM ( SELECT room_id,
COUNT(user_id) AS occupants,
NULL AS selectees
FROM Rooms
GROUP BY room_id
UNION
SELECT room_id,
NULL AS occupants,
COUNT(user_id) AS selectees
FROM Rooms
WHERE user_id IN (1,3)
GROUP BY room_id
) x
GROUP BY x.room_id
ORDER BY percentage DESC
will give you a list of rooms ordered by the "best fit" percentage
ie. it works out a percentage of fulfilment based on the number of people in the room, and the number of people from your set who are in the room

Repeated Insert copies on ID

We have records with a count field on an unique id.
The columns are:
mainId = unique
mainIdCount = 1320 (this 'views' field gets a + 1 when the page is visited)
How can you insert all these mainIdCount's as seperate records in another table IN ANOTHER DBASE in one query?
Yes, I do mean 1320 times an insert with the same mainId! :-)
We actually have records that go over 10,000 times an id. It just has to be like this.
This is a weird one, but we do need the copies of all these (just) counts like this.

The most straightforward way to this is with a JOIN operation between your table, and another row source that provides a set of integers. We'd match each row from our original table to as many rows from the set of integer as needed to satisfy the desired result.
As a brief example of the pattern:
INSERT INTO newtable (mainId,n)
SELECT t.mainId
, r.n
FROM mytable t
JOIN ( SELECT 1 AS n
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
) r
WHERE r.n <= t.mainIdCount
If mytable contains row mainId=5 mainIdCount=4, we'd get back rows (5,1),(5,2),(5,3),(5,4)
Obviously, the rowsource r needs to be of sufficient size. The inline view I've demonstrated here would return a maximum of five rows. For larger sets, it would be beneficial to use a table rather than an inline view.
This leads to the followup question, "How do I generate a set of integers in MySQL",
e.g. Generating a range of numbers in MySQL
And getting that done is a bit tedious. We're looking forward to an eventual feature in MySQL that will make it much easier to return a bounded set of integer values; until then, having a pre-populated table is the most efficient approach.

Mysql with regular expression

I have a query regarding regular expression.I have design a table which contain three column one column contain member ids which are separated by commas.I am showing you my table structure.Please follow
send_id member_id
1 1211,23,34
2 1,23
I want to select only send_id 2 data which contain member_id as 1.
this is the query that i am using
SELECT * FROM table WHERE column REGEXP '^[1]+$';
but this query giving me both row.Please help me.
With Regards
Rahul

Never store separate values in one column
Normalize your structure like
send_id member_id
1 1211
1 23
1 34
2 1
2 23
If you still want your regex, then it will be
SELECT * FROM t WHERE column REGEXP '(^|[^0-9])1([^0-9]|$)'

First, you should be normalizing your data so you're not in this horrible mess in the first place. Here's a good resource explaining normalization.
Second, I believe your problem lies with your regular expression. Try this instead:
SELECT * FROM table WHERE column REGEXP '^[1]$';
The regular expression you're using uses the [1]+ group. The + means it has to match [1] 1 or more times, hence why you're getting two rows instead of one. Removing the + means it will match [1] once.
However, that still won't fix your problem, as more than one row contains 1. This is why normalization is so important.

Having multiple values inside a column isn't a good practice for designing a DB.
You should normalize your data, i.e., put just one piece of atomic information inside each element of your table.
You can find more information regarding to this in Wikipedia:
http://en.wikipedia.org/wiki/Database_normalization

Like they have told you, perfect solution would be normalize your data, I think Alma Do Mundo answer explains it quite well.
If you want to use REGEXP anyway you have to take in account four approaches; id is the only one, id is the first, id is in the middle and id is at the end. I have use id=74 for the example:
SELECT * FROM table WHERE member_id REGEXP '(^74$|^74,|,74,|,74$)';

depending on your requirements, you should either normalize your data i.e. make 3 tables, one with the send ID, one with the member id, and one that combines the two, then you can link them up with INNER JOINS.
However, if you are going to do it that way, you can use a "WHERE member_id LIKE %1%" to pull in all the relevant fields. You'll have to use the application to filter the relevant records.
In any case, if you're not going to normalize the data you will have to use the front end to filter out the results.
An example of the inner join syntax would look like this
SELECT * FROM SendTable
JOIN Send_Member ON SendTable.send_id = Send_Member.send_id
JOIN Member ON Member.member_id = Send_Member.member_id
WHERE Member.member_id = 1;
where the schema looks like:
Sendtable:
send_Id (primary key)
...other fields
Send_Member:
send_id (primary key and foreign key to SendTable)
member_id (primary key and foreign key to member)
...any fields you might want that are relevant to the particular send table and member table link
Member:
member_id (primarykey)
...other fields

splitting data in a table column for search

I was working with a simple mysql table in php when I came across this problem and I am wondering if there is a solution to this.
The table holds a username and his locations in a comma separated format.
id|user|locations
------------------
1 |abc | A, B, C
------------------
2 |xyz | P, Q, R
I was wondering if there was any way to write a mysql query so that it would return me a user who has location as A.
Basically if one of the values among the comma separated values match, the record should be returned.
I know it is a better way to store them as separate records, but I was just curious if such a retrieval is possible.
Thanks in advance.

Ideally, you should consider normalizing the data so you are not storing the comma separated list.
But if you cannot alter the table structure, MySQL has a FIND_IN_SET() function that can be used to return the rows that match the value you want:
select id, user, locations
from yourtable
where find_in_set('A', locations)
See SQL Fiddle with Demo

We think
the following query may help you -
SELECT * FROM table WHERE column REGEXP '(^|,)A($|,)'
You can have a useful link in -
How to query comma delimited field of table to see if number is within the field

Using PHP to merge duplicate records from MySQL database in CakePHP

I have a table that stores data that has been entered regarding the amount of waste put in a bin. So my table looks like this:
Material | Weight
===================
Paper | 10
Plastic | 5
Paper | 7
As you can see, I'm going to have duplicate data in the table. At the moment I have multiple instances of different materials, and they all have different weight values attached to them.
Is it possible in PHP to get these duplicate entries, combine them in to one entry, and then display them? So the code would take the 10Kg of Paper and add it to the other instance of paper in the table (7Kg) and then output the value?
I have tried the GROUP BY in MySQL, but all that will do is combine all of the entries and give me the value of the top record, which isn't right.
Thanks!

Use MySql, with a SUM column. This will sum up all values for that column, for each grouping. This is assuming the weight column is just a number (10 instead of 10kg).
SELECT
`material`,
SUM(`weight`) AS `weight`
FROM `material_weights`
GROUP BY `material`
If the weight column isn't just a number (10kg instead of 10), then there will be issues.
If all weights are in KG, then you should just remove the 'kg' value from each weight, and convert the weight column from text into a numeric column.
If there are different kinds of weights (KG, LB, G, etc), then the best way would be to have an extra field in the table, with the weight converted into KG.

Since all your data seems to be in strings, it seems like you would be best served by using a php migration script to examine your data and then combine duplicates. First thing you want to do is determine which Materials have duplicates.
SELECT Material FROM {TABLE} GROUP BY Material HAVING COUNT(*) > 1;
From there you should loop through the materials that come back, and grab all rows with the Material value.
SELECT * FROM {TABLE} WHERE Material = '{$material}';
This will give you all the rows labeled that Material. From there, apply any transformations (just in case there are values labeled g, for example) to the numeric value to ensure you're operating on the same type of value. Then you'd delete all the rows with that type of material. (You have a backup, right?)
DELETE FROM {TABLE} WHERE Material = '{$material}';
Lastly, insert the value you just determined.
INSERT INTO {TABLE} (Material, Weight), ('$material', '$weight');

SELECT
material,
SUM(CAST(REPLACE(weight, 'kg', '') AS UNSIGNED)) AS weightsum
FROM
tbl
GROUP BY
material
You can use the SUM() function with GROUP BY to get the sum of the weight per unique material. In your case, your weight field appears to be a string. You can simply take out the 'kg' from each value using REPLACE, then convert it to an integer, which is then passed to SUM().

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Comparing Data in mySQL - php

If you want to take the first 9 digits of ID from Table 2, you could use left(table2.id,9). It will return the 9 first (left) characters from that field. Then you can use that with a LIKE (using the "%" wildcard) or using regular expressions.

Why not store the data you need in a dedicated column in each table? Make it a number column and index it, then you can do a very efficient JOIN using that column. SELECT * FROM Table_1 LEFT JOIN Table_2 USING(common_index)

you can you something like this : select TRUNCATE(tbl2.orderId,0) orderNum, tbl2.vipcode, tbl1.interface from table2 tbl2 , table1 tbl1 WHERE tbl1.description like CONCAT('%',TRUNCATE(tbl2.orderId,0),'%'); i made a fiddle here

Related

Most efficient method determining if a list of values completely satisfy a one to many relationship (MySQL)

Repeated Insert copies on ID

Mysql with regular expression

splitting data in a table column for search

Using PHP to merge duplicate records from MySQL database in CakePHP

Categories

Resources