PHP/MySQL code to process associated entries

PHP/MySQL code to process associated entries - php

I have table:
Name GroupID etc...
ABC
ABC
DEF
DEF
DEF
KKK
LLL
III
III
I'd like a PHP/MYSQL mix to process into this:
Name GroupID etc...
ABC 1
ABC 1
DEF 2
DEF 2
DEF 2
KKK 0
LLL 0
III 3
III 3
ie. If entries with duplicated Name exist for the row (exactly string match), it will assign a GroupID (increment automatically) to all of the entries with that Name. If the entry is unique name, it will assign a 0 to the GroupID
My table has 250,000 entries, what is the fastest way to achieve this? Working code would be nice but high level algorithm is good enough to get me going.
Thanks!

This could be done with a quick PHP script, but I like the idea of letting the database handle it by itself.
You could probably do this with a clever UPDATE join, but because I can't test it I'll use a temporary table instead. The idea is to select all values for Name having counts > 1 and assign a row number to them into a temporary table. Then use an update join to modify the GroupID in the original table.
SET #rownum=0;
CREATE TEMPORARY TABLE groupnums (groupid INT, Name VARCHAR(16), numgroups INT)
SELECT
#rownum := #rownum + 1 AS groupid,
Name,
COUNT(*) AS numgroups
FROM original_table
GROUP BY Name
HAVING COUNT(*) > 1
UPDATE
original_table
JOIN groupnums ON original_table.Name = groupnums.Name
SET original_table.GroupID = groupnums.groupid
Then set the remaining ones to 0
UPDATE original_table SET GroupID='0' WHERE GroupID IS NULL
And get rid of the temporary table.
DROP TABLE groupnums;
Update:
After testing this quickly for myself, I find that although it works you won't get directly incremental values for groupid. The #rownum is incremented for each row rather than each group so you'll end up with groups like the following with gaps in between.
/* Sample results - groups work, but have gaps between GroupID */
Name GroupID etc...
ABC 1
ABC 1
DEF 3
DEF 3
DEF 3
KKK 0
LLL 0
III 6
III 6
Update 2 I overcomplicated this a bit.
On deeper thought, the #rownum isn't needed at all. Just use an auto-increment id in the temporary table. This should produce incremental GroupID without the gaps in between. Use the same UPDATE statement to join against this as above.
CREATE TEMPORARY TABLE groupnums (groupid INT NOT NULL AUTO_INCREMENT, Name VARCHAR(16), numgroups INT)
SELECT
NULL AS groupid
Name,
COUNT(*) AS numgroups
FROM original_table
GROUP BY Name
HAVING COUNT(*) > 1

Related

Efficiently get diff of large data set?

I need to be able to diff the results of two queries, showing the rows that are in the "old" set but aren't in the "new"... and then showing the rows that are in the "new" set but not the old.
Right now, i'm pulling the results into an array, and then doing an array_diff(). But, i'm hitting some resource and timing issues, as the sets are close to 1 million rows each.
The schema is the same in both result sets (barring the setId number and the table's autoincrement number), so i assume there's a good way to do it directly in MySQL... but im not finding how.
Example Table Schema:
rowId,setId,userId,name
Example Data:
1,1,user1,John
2,1,user2,Sally
3,1,user3,Tom
4,2,user1,John
5,2,user2,Thomas
6,2,user4,Frank
What i'm needing to do, is figure out the adds/deletes between setId 1 and setId 2.
So, the result of the diff should (for the example) show:
Rows that are in both setId1 and setId2
1,1,user1,John
Rows that are in setId 1 but not in setId2
2,1,user2,Sally
3,1,user3,Tom
Rows that are in setId 2 but not in setId1
5,2,user2,Thomas
6,2,user4,Frank
I think that's all the details. And i think i got the example correct. Any help would be appreciated. Solutions in MySQL or PHP are fine by me.

You can use exists or not exists to get rows that are in both or only 1 set.
Users in set 1 but not set 2 (just flip tables for the opposite):
select * from set1 s1
where set_id = 1
and not exists (
select count(*) from set1 s2
where s1.user1 = s2.user1
)
Users that are in both sets
select * from set2 s2
where set_id = 2
and exists (
select 1 from set1 s1
where s1.setId = 1
and s2.user1 = s1.user1
)
If you only want distinct users in both groups then group by user1:
select min(rowId), user1 from set1
where set_id in (1,2)
group by user1
having count(distinct set_id) = 2
or for users in group but not the other
select min(rowId), user1 from set1
where set_id in (1,2)
group by user1
having count(case when set_id <> 1 then 1 end) = 0

What we ended up doing, was adding a checksum column to the necessary tables being diffed. That way, instead of having to select multiple columns for comparison, the diff could be done against a single column (the checksum value).
The checksum value was a simple md5 hash of a serialized array that contained the columns to be diffed. So... it was like this in PHP:
$checksumString = serialize($arrayOfColumnValues);
$checksumValue = md5($checksumString);
That $checksumValue would then be inserted/updated into the tables, and then we can more easily do the joins/unions etc on a single column to find the differences. It ended up looking something like this:
SELECT i.id, i.checksumvalue
FROM SAMPLE_TABLE_I i
WHERE i.checksumvalue not in(select checksumvalue from SAMPLE_TABLE_II)
UNION ALL
SELECT ii.id, ii.checksumvalue
FROM SAMPLE_TABLE_II ii
WHERE ii.checksumvalue not in(select checksumvalue from SAMPLE_TABLE_I);
This runs fast enough for my purposes, at least for now :-)

how to get user name based on id separated by comma in single table

user_id user_name user_friend_list
1 dharmendra 2,3,4,5,6,7
2 jitendra 1,3,4,5,6,7
3 xyz 1,2,6,7
4 pqr 1,3,4
now i want to extract user_id & user name based on user_friend_list id 6 i.e
it will return 1,2,3 user_id & user name dharmendra jitendra and xyz.
i simply use splite function of php but it is so complicated please provide me well shortcut method
thanks & regards

You can use FIND_IN_SET()
select user_id, user_name
from your_table
where find_in_set(6, user_friend_list) > 0
But it would actually be better to change your table design. Never store multiple values in one column!

first thing the table is not normalized that's why the problem exist
you must break this table like this or better :)
table 1
user_id
user_name
table 2
user_id
friend_id
table 2 will have some redundant data though which can be removed by adding a third table as a mapping between table1 and table2
now you can have the following query to get the result
select user_id,user_name from table1 as a join table2 as b on a.user_id=b.user_id where b.friend_id=6;

"SELECT user_id,username from your table WHERE user_id IN (2,3,4,5,6,7)";

MySQL selecting "previous" max(id) WHERE

I have a table that inserts data when a kid checks in into a summer camp area. If the kid is < than 12 years of age a parents barcode must be scanned to allow the kid entry into the young kids area (<12 years of age).
The table has the following columns.
kcID
uID
id
age
room
barcode
date
amber
The data will usually be presented like this.
kcID uID id age room barcode date amber
25 1 1 30 1000 0001 6/26/2014 1:27:40 AM 0
26 6 1 1 1000 0005 6/26/2014 1:27:40 AM 0
The problem I have is that I need to compare the dates/hours to know if the kid is entering or leaving the camp area and via php send an SMS to the parents so they know their kid is outside a particular area.
I know I can retrieve the max(kcID) WHERE barcode = XXXX and that will return the last inserted row, but, in order for me to retrieve said information the kid must be scanned, properly inserting a new row and rendering max(kcID) useless in this case.
What I need is to be able to select max(kcID) WHERE barcode = xxxx and then select the previous row record in which barcode = xxxx is found. That way I can compare dates and know if the kid is leaving or entering that particular area.
The easiest solution I can think of right now is to have 2 tables (1 for entry 1 for out) and have the camp counselors choose if the kid is entering or leaving but I'm wondering if I can use only 1 table.

Add a column with a status left and returned
ALTER TABLE `table` ADD `status` enum('left', 'returned') NOT NULL DEFAULT 'left';
now you can select a second last row by following query
select `kcID`, `barcode` from `table` where `status` = 'left' ORDER BY `barcode` DESC LIMIT 1,1

You could add a column indicating leave / return, and add this to your condition.
e.g.
ALTER TABLE `table`
ADD `status` enum('left', 'returned') NOT NULL DEFAULT 'left';
a query would be
SELECT `kcID`, `barcode`
FROM `table`
WHERE `status` = 'left'
ORDER BY `timefield` DESC
LIMIT 1

How to Remove duplicate rows having minimal set of information?

I have a situation,
My MySQL table (company) contains duplicate records,i.e.,it has repeated companies, some records have values in most columns and some don't have. So I want to remove the duplicate companies having minimal set of information. Guys any ideas?
Id Company_name column column2 column3 column4
-------------------------------------------------
1 A xyz
2 B pqr abc tcv aaa
3 A bnm xyz ccc
4 A bnm xyz
5 B aaa
I need to get my table as follows
Id Company_name column column2 column3 column4
-------------------------------------------------
2 B pqr abc tcv aaa
3 A bnm xyz ccc

You can have a php method to do this work, and manually you will retrieve all the record grouped by the column by what you want to reduce the repetitive rows. In above case you are considering for the Company_name column. But there is possibility that it may have some different value on other columns but not in the Company_name column. This may create ambiguity in understanding that how it will the algorithm will treat such type of row.
But it will be good practice that before inserting the values, the information must be checked so no repetition occurs. But in the case when you already have such records,following query may help.
DELETE FROM TABLENAME WHERE (Company_name, column)
NOT IN
(
SELECT Company_name, column FROM
(
SELECT MIN(Id) AS Id, column FROM TABLENAME GROUP BY Company_name
)
X
);
This is for deleting the duplicate values for one column, you can make with combination of multiple query to reduce the duplicate values.

It's possible to get a "score" of each row and base the decision on that. Here is a quick example that shows where to start.
SELECT id,
name,
length(concat_ws('', col1, col2, col3, col4)) AS score
FROM company
ORDER BY score DESC;
See it on sqlfiddle

how to use MySQL bitwise operations in php?

im trying to use MySQL bitwise operations for my query and i have this example:
table1
id ptid
1 3
2 20
3 66
4 6
table2
id types
1 music
2 art
4 pictures
8 video
16 art2
32 actor
64 movies
128 ..
...
now, the id = 3 from table1 is '66', witch means that it has 64 or movies and 2 or art
but
doesn't he also have 32 or actor twice and 2 or art ??
hope you see where my confusion is. How do i control what result i want back. In this case i want 64 or movies and 2 or art.
But sometimes i want three id's from table2 to belong to an id from table1
any ideas?
Thanks

Using bitwise OR
The following query returns all the items from table 2 in 66:
SELECT *
FROM table2
WHERE id | 66 = 66
But 32 + 32 = 64?
Though 32 + 32 = 64, it doesn't affect us.
Here's 64 in binary:
01000000
Here's 32 in binary:
00100000
Here's 2 in binary:
00000010
It's the position of the 1 that we use in this case, not the value. There won't be two of anything. Each flag is either on or off.
Here's 66 in binary. Notice that 64 and 2 are turned on, not 32:
01000010
Using bitwise AND instead of OR
Another way to write the query is with bitwise AND like this:
SELECT *
FROM table
WHERE id & 66 <> 0
Since 0 = false to MySQL, it can be further abbreviated like this:
SELECT *
FROM table
WHERE id & 66

select * from table2 where id & 66

Although the question on how to perform bitwise operations in MySQL has been answered, the sub-question in the comments about why this may not be an optimal data model remains outstanding.
In the example given there are two tables; one with a bitmask and one with a break down of what each bit represents. The implication is that, at some point, the two tables must be joined together to return/display the meaning of the various bits.
This join would either be explicit, e.g.
SELECT *
FROM Table1
INNER JOIN TABLE2
ON table1.ptid & table2.id <> 0
Or implicit where you might select the data from table1 into your application and then make a second call to lookup the bitmask values e.g.
SELECT *
FROM table2
WHERE id & $id <> 0
Neither of these options are ideas because they are not "sargable" that is, the database cannot construct a Search ARGument. As a result, you cannot optimize the query with an index. The cost of the query goes beyond the inability to leverage an index since for every row in the table, the DB must compute and evaluate an expression. This becomes very Memory, CPU and I/O intensive very quickly and it cannot be optimized without fundamentally changing the table structure.
Beyond the complete inability to optimize the query, it can also be awkward to read the data, report on the data, and you also potentially run into limits adding more bits (64 values in an 8 bit column might be fine now but not necessarily always so. They also make systems difficult to understand, and I would argue that this design violates first normal form.
Although using bitmasks in a database is often a sign of bad design, there are times when it's fine to use them. Implementing a many-to-many relationship really isn't one of those times.
The typical approach to implementing this type of relationship looks something like this:
table1
Id Val1 Val2
---------------------------
1 ABC DEF
2 ABC DEF
3 ABC DEF
4 ABC DEF
5 ABC DEF
6 ABC DEF
table2
id types
-------------
1 music
2 art
3 pictures
4 video
5 art2
6 actor
7 movies
table1-table2-relationshitp
table1ID Table2ID
---------------------
1 1
1 2
2 3
2 5
3 2
3 7
...
And you would query the data thusly
SELECT table1.*, table2.types
FROM table1
INNER JOIN table1-table2-relationship
ON table1.id = table1-table2-relationship.table1id
INNER JOIN table2
ON table1-table2-relationship.table2.id = table2.id
Depending on the access pattern of these tables, you would typically index both columns on the relationship table as a composite index (I usually treat them as a composite primary key.) This index would allow the database to quickly seek to the relevant rows in the relationship table and then seek to the relevant rows in table2.

After playing around with the answer from Marcus Adams, I thought I'd provide another example that helped me understand how to join two tables using bitwise operations.
Consider the following sample data, which defines a table of vowels, and a table of words with a single value representing the vowels present in that word.
# Create sample tables.
drop temporary table if exists Vowels;
create temporary table Vowels
(
Id int,
Letter varchar(1)
);
drop temporary table if exists Words;
create temporary table Words
(
Word varchar(20),
Vowels int
);
# Insert sample data.
insert into Vowels
select 1, 'a' union all
select 2, 'e' union all
select 4, 'i' union all
select 8, 'o' union all
select 16, 'u';
insert into Words
select 'foo', 8 union all
select 'hello', 10 union all
select 'language', 19 union all
select 'programming', 13 union all
select 'computer', 26;
We can now join the Vowel table to the Word table like so:
# List every word with its vowels.
select Word, Vowels, Letter, Id as 'Vowel Id'
from (
select *
from Words
) w
join Vowels v
where v.Id | w.Vowels = w.Vowels
order by Word, Letter;
And of course we can apply any conditions to the inner query.
# List the letters for just the words with a length < 6
select Letter
from (
select *
from Words
where length(Word) < 6
) w
join Vowels v
where v.Id | w.Vowels = w.Vowels
order by Word, Letter

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.